A Machine Learning Model using Python & Tensorflow to Describe the Market Price of Digital Currencies

The Case of Bitcoin:

Satoshi Nakamoto is the pseudonym used by the original developer, of the first successful implementation of a distributed cryptographic payments ledger operating on a decentralized network of actors who have thus far, committed to using this decentralized ledger as a mechanism of undermining the current paradigm of issuing currency and controlling a society's money supply via a Central Authority$^{1}$. Its roots date back to Wei Dai and his description of "a scheme for a group of untraceable digital pseudonyms to pay each other with money and to enforce contracts amongst themselves without outside help" which he published on the CypherPunks mailing list in 1998$^{2}$. The work that Wei Dai created had its foundations in a cryptographic framework developed by an obscure employee working at the NSA in 1975.

"The NSA's cryptographic monopoly has evaporated. Two decades ago, no one outside the government, or at least outside the government's control, performed any serious work in cryptography. That ended abruptly in 1975 when a 31-year-old computer wizard named Whitfield Diffie came up with a new system, called "public-key" cryptography, that hit the world of cyphers with the force of an unshielded nuke. The shock wave was undoubtedly felt most vividly in the fortress-like NSA headquarters at Fort Meade, Maryland."$^{8}$

A Cryptic History

Bitcoin is a peer-to-peer, permissionless payment network that enables the issuance, and the transfer of electronic cash between unknown users who rely on public-key cryptography to verify the authenticity of the transactions recorded on the public ledger also known as the blockchain. The reliance on public-key cryptography to feasibly secure a network from attack is discussed in the book published by the RSA Press in 2001, titled PKI Implementing & Managing E-Security in which they argue that "key lengths are chosen so that it is computationally infeasible to try half the keys within the key space even if you use massive numbers of computers over the length of time the data being protected must remain secure."$^{3}$ In fact, RSA Laboratories published a bulletin by Robert Silverman in April 2000 in which he compared the relative strengths of various key lengths and algorithms. It would take a typical machine 10E16 years to "crack" a 128-bit Symmetric Key or a 1620-bit RSA key$^{4}$. Bitcoin uses this strength and the architecture provided by public-key cryptography to provide a decentralized solution to security which no longer requires that two parties to a transaction authenticate themselves in order to establish a trust-based environment$^{5}$. By eliminating the need for trust, Bitcoin eliminated the on ramps to fraud and abuse by implementing cryptographic mechanisms that simply require a public-key to validate a transaction on the network instead of relying on passwords that traditionally provide weak security.

Bitcoin's value comes from its ability to provide society a mechanism in which a central authority is no longer needed to provide a private, frictionless, and immutable transaction on a transparent and secure public ledger. In eliminating the regulated third parties that govern our existing payment networks, transaction fees are reduced and defined by the initiator of a transaction, providing absolute freedom of choice to the consumers in this new digital free market economy by anonymizing the actors involved in a transaction to limit the ability of governments to regulate and legislate the actions of the network's participants. As a distributed network, there is no one central node that can be seized, there is no central authority that controls and regulates the entire network. Any node can be shut down without affecting the pervasiveness of the ledger. So long as the user of the network has access to their cryptographic keys, then the user can access the Bitcoin Ledger and the digital currency accounted for within it.

In the 1990's, cryptographic technology was viewed by the US Government as a national security issue and was considered a munitions export and was regulated as such$^{6}$. This led to the inability of researchers and cryptographers working in the industry to commercialize these tools as products that could help supply the demand and the need for secure methods of communication on an enterprise level and eventually, global scale. The regulatory framework forced these individuals, most of whom were senior officials at technology companies to work in obscurity while building the privacy tools and digital framework that eventually led to the adoption of Bitcoin as a global digital currency. In 1993, Eric Hughes, considered the founder of the CypherPunks movement, published "A Cypherpunk's Manifesto" in which he outlined the goals of this ad-hoc team of technology professionals who knew that by side-stepping government sanctions, they could use cryptography to make the world a better place. The values of the movement to guarantee the privacy of our society are outlined as follows:

"We can't expect governments, or other large, faceless organizations to grant us privacy out of their beneficence. It is to their advantage to speak of us, and we should expect that they will speak. To try to prevent their speech is to fight against the realities of information. Information does not just want to be free, it longs to be free. Information expands to fill the available storage space. Information is Rumor's younger, stronger cousin; Information is fleeter of foot, has more eyes, knows more, and understands less than Rumor. We must defend our own privacy if we expect to have any. We must come together and create systems which allow anonymous transactions to take place. People have been defending their own privacy for centuries with whispers, darkness, envelopes, closed doors, secret handshakes, and couriers. The technologies of the past did not allow for strong privacy, but electronic technologies do. We the Cypherpunks are dedicated to building anonymous systems. We are defending our privacy with cryptography, with anonymous mail forwarding systems, with digital signatures, and with electronic money. Cypherpunks deplore regulations on cryptography, for encryption is fundamentally a private act. The act of encryption, in fact, removes information from the public realm. Even laws against cryptography reach only so far as a nation's border and the arm of its violence. Cryptography will ineluctably spread over the whole globe, and with it the anonymous transactions systems that it makes possible. For privacy to be widespread it must be part of a social contract. People must come and together deploy these systems for the common good. Privacy only extends so far as the cooperation of one's fellows in society."$^{7}$

Notable individuals involved in the CypherPunks movement include Julian Assange, Jacob Appelbaum, Steven Bellovin, and Peter Junger.

As the digital economy has matured, the demand for secure digital enablement tools has grown exponentially$^{8}$. In 1995, Adam Back published a cryptographic algorithm based on RSA and suggested that people use it as their email signature to incite Civil Disobedience when he wrote the following:

"HashCash is free, all you've got to do is burn some cycles on your PC. It is in keeping with net culture of free discourse, *where the financially challenged can duke it out with millionaires, retired government officials, etc on equal terms."$^{9}$

The freedom sparked by a few lines of code inspired a generation of tools that society could use to help improve the quality of life for humans globally by simply providing access to distributed forms of secure banking and capital. It was Adam Back who invented HashCash in 1997 as a proof-of-work algorithm that requires computational power from a user's machine to solve a cryptographic puzzle needed to generate a hash stamp that would digitally sign and validate the sending of an email over a network, whose severs used the hash stamp to validate the sender as a legitimate user and sender of email messages on the email platform using a computationally expensive function. It's deployment at first was as an effective counter measure against Distributed-Denial-Of-Service (DDOS) attacks on data networks, and it proved to be of much value as an anti-spam tool to deter would be spammers and DDOS attackers by making it too computationally expensive to proceed with their black-hat objectives.

The goal of an email spammer for example, would be to send 10,000 or more emails per minute via a Broadband connection established under illicit means and false pretenses in order to infiltrate the inboxes of users globally as quickly as possible before their connections are turned off to inundate users with scams, and fraudulent schemes that cost the global economy 20 billion USD annually and would be approaching over 50 billion USD if it were not for the private investments made into anti-spam technologies such as HashCash$^{10}$. In September of 2016, sixty-two percent of spam email was some form of RansomWare$^{11}$. In the first quarter of the same year 209 million dollars was paid to criminals conducting RansomWare attacks via spam email infections$^{12}$. Using HashCash to validate messages sent over a network would only add a negligible amount of CPU overhead per email sent, resulting in an inconsequential delay in the regular processing of the message as the CPU performs the calculations necessary to satisfy the proof-of-work algorithm defined by HashCash. This validation process however, would add an exponential amount of time in providing the computations needed to satisfy the validation of the hashes for the overwhelmingly persistent amount of messages sent by spammers and RansomWare attackers, therefore making it less profitable to attack and conduct a scalable spamming campaign. This is the same concept used by Bitcoin and its blockchain in order to make the network so computationally expensive to attack, that any attempt would result in a severe financial loss to any would be attackers in terms of the amount of computational hardware that would need to be purchased to compete against all of the nodes coming to consensus across the entire distributed Bitcoin network globally. Bitcoin does differ from HashCash in that Bitcoin makes use of the SHA256 algorithm as an added cybersecurity measure. Additionally, HashCash is just the mining function used by the Bitcoin ledger, Bitcoin's innovation and invention had never been fully realized in any previously attempted electronic-cash protocols.

"Bitcoin represents a leap forward in electronic cash technology demonstrating for the first time that a respendable, distributed, virtual scarcity based system could be built."$^{13}$

Bitcoin did not begin in 2009 with Satoshi Nakamoto and his infamous whitepaper describing the implementation of the distributed network of computers that could come to consensus on a distributed version of the ledger using a technologically democratic protocol that we know today as Bitcoin$^{5}$. What we know of today as Bitcoin and its OpenSource Blockchain technology that is not governed by exclusionary policy controls and truly lends itself as a global currency and a democratized payments network, has its foundations in the revolutionary code written by Adam Back in 1995$^{14}$ below:

A Federally Regulated Munitions Export

#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)

The Global Financial Inclusion Problem

In 2014 it was reported by the World Bank that there were 2 billion people on Earth without access to banking and that 7 countries which include China, Indonesia, India, Pakistan, Nigeria, Bangladesh, and Mexico provide just under half of the world's population of "unbanked" individuals$^{15}$. Amongst the majority of those without access to banking, the data suggests that the poor are overrepresented as people without a banking account. It is a hypothesis of this research, that because of its decentralized and digital nature, Bitcoin provides an outlet for the poor to participate in the global economy by enabling people in these communities to make transactions on a blockchain without policy constraints, or trade barriers, to allow those in secluded communities to access a larger global economic network using these tools to conduct trade with digital assets. Those without banking, being overrepresented amongst citizens of developing countries, are now afforded the opportunity to build wealth, and participate in the technological revolution unfolding in the global financial landscape by simply using Bitcoin and its secure approach to money.

The world bank cites the following reasons as to why people in developing countries remain without banking or are considered "unbanked" (97 percent of respondents cited multiple reasons$^{**}$):$^{15}$

  1. 30 percent of adults without accounts stated that a banking account is not needed due to insufficient funds (3 percent cited this as the only reason).

  2. Banking & transaction costs were cited as a reason for not using banking by 26 percent of respondents.

  3. Another 26 percent of respondents declared that they bank through a family member.

  4. Distance to a financial institution was cited as a barrier to the use of banking by 22 percent of adults without accounts.

  5. Documentation is another intrusive barrier to those without access to banking. 20 percent of the people without access to banking simply do not have the documentation required to use banking as dictated by the Central Powers that control and regulate the global monetary system.

  6. Distrust of the global financial system was a reason cited by 16 percent of those without access to banking. Latin America, Europe, and Central Asia have the highest rates of distrust.

Stated Need

Addressing the reasons just listed, insufficient funds are due to a lack of opportunity and work. If provided the right digital tools that leverage the 21st century infrastructure that we are seeing implemented in the West, the poor could potentially begin to innovate and create new tools around blockchain technology, which is accessible to the world as an OpenSource library of documented code, in order to develop a locally based ecosystem to support commercial trade in digital currencies. A perfect case study highlighting how digital enablement can help a community adapt to today's technological landscape is Kenya's organic implementation of the M-PESA as a mobile device based digital currency.

"Few initiatives in microfinance, or for that matter in development, have been as successful as M-PESA: 3 and a half years after launch, over 70 percent of households in Kenya and more importantly over 50 percent of the poor, unbanked and rural populations use the service. M-PESA’s success means there is a real need for small electronic transactions and storage of value."$^{16}$

By providing a "frictionless" and secure mechanism to more easily send and receive "money" between people in a poor community, the M-PESA created a market that provides access to an outlet for entrepreneurial expression, and Kenya has proved that access to a free market with ease of payments between parties to a transaction will infact, provide upward financial mobility that takes people out of poverty.

"Access to M-PESA increased consumption levels over a six-year period, enabling an estimated 186,000 families, or as many as 2 percent of Kenyan households, to move out of poverty. The impact on female-headed households was more than twice the average measured."$^{17}$

Existing payment networks can choose to adapt their constraints around Anti Money Laundering, and monetary velocity controls, to meet the needs of these populations that do not live under a rule of law that satisfies the international laws put in place for these centrally controlled actors to operate within. If existing financial institutions and global payment networks fail to meet the demand of these poor populations for inclusive access to the global economy, the authors of this research posit that the world will continue to see more Fintech disruption coming from the field of Blockchain technologies like Bitcoin. We envision that the ultimate solution to the question of how existing payment networks and financial institutions can leverage blockchain technology to provide more inclusive access to the global economy, lies in their flexibility and ability to negotiate the relaxation of financial policy constraints for state documentation, which can be supplanted by public-keys for example, and Anti-Money Laundering constraints, by using monetary velocity controls to ensure that only micro-transactions are being made without Sovereign documentation. Taking a flexible approach would afford the free market, and the entrepreneurs who operate within it, the freedom to develop tools that leverage not just the innovation created by blockchain technologies, but also the knowledge and experience that can be leveraged from the existing global payment networks.

Bitcoin has given the world the ability to build new tools onto our existing infrastructure that will provide more opportunities for people to build wealth by conducting secure trade within their local communities without being constrained to them. Bitcoin is an innovative tool that is forcing society to rethink how we think about money and how we think about providing access to the economy created by that money. M-PESA provides us with a clue into how we can create innovative solutions around the implementation of these digital currency models which have thus far proven to help eradicate poverty.

$^{**}$ To help shed light on the reasons for this the 2017 Global Findex survey asked adults without an account at a financial institution why they do not have one. Respondents could offer more than one reason, and most gave two.

Transaction costs

As we begin to understand the nature of these problems on a global level, we need to obtain and quantify data that will allow us to build statistical and machine learning models that will serve to help us better understand the correlation and interdependence of the different asset classes that society uses to measure global economic health, and how they relate to the price of Bitcoin. Can we explain the historical context of Bitcoin's adoption, and its emerging status as an asset class, when juxtaposed against the time-series data of a basket of global assets and Bitcoin network and Google Search Trends data, to test against the claim that Bitcoin is the new "Gold Standard" in a perpetually expanding digital economy?

Using Quandl$^{18}$, we chose to commence our analysis with observations pertaining to the cost of a transaction on the Bitcoin network to compare those observations to the costs of conducting a transaction on an existing payments network. Below we begin the process of accessing the Application Programming Interfaces (API) that we need to use to consume the observed data, using Python, to better visualize the model we will now implement.

In [1]:
import quandl as ql

# Configure quandl API Key & Authenticate access
apiKey = "-rLjiPduuzgzKp99MMHb"
ql.ApiConfig.api_key = apiKey

# Set Dates for Data Analysis
start = '2010-08-29'
end = '2018-08-19'

# Capture transaction cost data
cpTxn = ql.get("BCHAIN/CPTRV", start_date='2016-01-03', end_date=end, column_index='1')

# Describe data and find mu, sigma...
cpTxn.describe()
Out[1]:
Value
count 960.000000
mean 1.157900
std 0.552766
min 0.324599
25% 0.780000
50% 1.020000
75% 1.392634
max 4.368342

For the thirty-two-month period from January 2016 to August 2018, the mean Bitcoin transaction cost percentage was ~1.16 percent. Comparatively,

"When it’s all said and done, the average cost of processing payments on Visa, Mastercard, AMEX, or Discover networks for U.S. businesses that do between 10,000 USD and 250,000 USD in annual payments volume is between 2.87 percent and 4.35 percent per transaction."$^{19}$

With Bitcoin being able to provide a frictionless method to exchange a store of value between two counterparties at rates that are approximately 68 percent less than well established payment networks in the United States, cases like the M-PESA, and Bitcoin continue to provide evidence that these are the sustainable digital currency solutions that the market demand calls for in regards to the need for open access to these digital financial tools in poor communities globally to foster incentive based innovation, entrepreneurship, and locally developed opportunities that tap into a larger, globally connected digital economy. The authors of this paper hypothesize however, that as Bitcoin gets closer to the validation of the final reward block to be mined, and the network stops rewarding miners for validating the transactions on the blockchain with Bitcoin, the price will be affected in that miners will need to increase the transaction fees associated to mining, which can lead to parity in terms of the transaction costs of traditional payment networks.

What will the model we define tell us about the price of Bitcoin when transaction costs are increased or decreased?

Anonymous Banking, Documentation Requirements, and the Distrust of the Financial System

The innovation that Bitcoin brought to the financial world, was an implementation of a permissionless network that approached the problem of User Authentication 180 degrees differently to the approach that conventional payment networks currently have in production. Traditional payment networks typically rely on a small network of distributed servers in data centers to provide the network connectivity resources needed to provide the payment services that form the backbone of the global economy. These data centers can be thought of as a central node that can be governed by the policy constraints of a central authority because their architecture was designed for redundancy and to ensure maximum network uptime and continuous connectivity. This architecture that connects to a central node, governed by a central authority, requires the establishment of trust amongst all of its users, who are then required to abide by any regulation set forth by the central authority legislating the activities that can be conducted on the network, which in turn will typically and unintentionally, limit access to the network by many of the poorest people in our society, as implied by the World Bank data presented in this research. This trust-based model requires that users be authenticated via permission-based mechanisms that typically rely on passwords, device verification methods, biometric data, and other tools that have historically been susceptible to attack, and which continue to provide the on ramp to fraud and abuse of the payment networks. To combat this, policy controls and legislation in the form of sovereign documentation requirements, trade embargos, country specific sanctions, and other restrictions are imposed, which unintentionally impact the poorest communities in the developing world.

Bitcoin's approach in implementing a permissionless network that no longer relies on trust, is to instead rely on Public-Key Cryptography, meaning that anyone with a valid Public/Private Key pair could access the system, and the digital economy created by this new innovation. By relying on the compounding growth in computing power provided by the millions of anonymous actors acting independently, and in their own interests, to build a new digital economy, only serves to exponentially strengthen the security and resilience of the Bitcoin network. In effect, Satoshi Nakamoto built a system that dares attackers to drain their computing resources in an attempt to attack the system. Attacking the system would in effect, become a "Kamikaze" attack on the network for an attacker and would make no impact on the system as a whole. The blockchain is immutable, and is accessible to anyone with an internet connection. This permissionless model in effect, provides a practical solution for those who harbor any distrust in a centrally controlled and regulated payments network, it addresses the problem of a lack of documentation in that now, people who live in impoverished communities blockaded by a trade embargo or sanction can access the global digital economy, without having to establish any form of trust to obtain permission to access the economy which can be leveraged to obtain upward financial mobility, and it allows those who want to remain anonymous, to access the digital economy directly instead of through a family member with the use of nothing more than a Public/Private Key pair.

Covering the Distance

Humans are a resilient species indeed. The lengths to which a typical Kenyan would have to go through just less than a decade ago to access the markets and its financial services, validates the need for more direct approaches to those who need access the most. Distance and the costs associated with travelling throughout vast stretches of Africa, were cited by the World Bank as a barrier to access for the impoverished communities of the world.

"M-PESA has significantly reduced transaction costs in Kenya. When it was launched the average distance to the nearest bank was 9.2 kilometres. Eight years later in 2015 the average distance to the nearest M-PESA agent was a mere 1.4km."$^{20}$

In combination with SpaceX's new Starlink low cost broadband internet service, the people of the world now have a conduit that provides them access to the digital economy afforded to them by Bitcoin and other digital currencies, and a clear path with access to the tools that have statistically shown us that these approaches can help to eradicate global poverty as one of the United Nation's defined Sustainable Development Goals. As the adoption of Bitcoin continues to grow, the free market and continued Fintech innovation will provide the tools and platforms needed so that people in these poor communities can access the Bitcoin network to enable their local communities to trade in a digital currency with nothing more than an internet connection. The M-PESA model, is quite different from Bitcoin and its associated Blockchain, nonetheless it is a digital currency that organically grew from a free market demand that was met in order to solve the problem that these African communities had with fiat currencies and their unstable governments. The M-Pesa serves as an excellent use-case study to prove the need for solutions modeled around digital currencies like Bitcoin, that can be leveraged by software engineers to build applications that have proven to make a significant impact on the lives of the poorest people in the world.

How is Bitcoin Money and from Where does Bitcoin Derive its Value?

The solutions that Bitcoin provides in terms of its security, anonymity, decentralization, predictable issuance of currency of up to 21 million BTC, and in how the technology is implemented to provide open access to impoverished communities, by addressing the deep seated and systemic infrastructure weaknesses enabled by conventional financial schemes, all seem to provide valid arguments as to why the adoption rates of these cryptographic mechanisms have experienced the exponential growth rates that we have recently witnessed. To investigate the reasons as to how Bitcoin is money, and from where Bitcoin derives its value, to model and describe its price, we must dig deeper into what makes Bitcoin a currency, and what makes anything humans use as money, money, to describe its inherent value.

Determining how and why a currency holds its inherent value requires a complex understanding of the different constraints and priorities a society may have defined for their chosen currency. It can be argued that the only practical function of money is as a Medium of Exchange, without which, there would be no demand for money itself. The only reason we require money is to conduct trade and to satisfy debts with a note that tells the seller that something of value can be claimed with that note. The note itself, the money we create, has no intrinsic value of its own.$^{21}$

Ultimately, money describes physical materials with certain 'ideal' properties, that when applied by society, exhibit characteristics that allow societies to conduct trade using money for one of the following three reasons:$^{22}$

  1. Medium of Exchange: Money should be portable and easy to trade with. Bitcoin can be used for buying and selling goods and services. OpenBazaar, and the now defunct SilkRoad are examples of online marketplaces that allow buyers and sellers to conduct trade anonymously, without censorship, and without fees using Bitcoin as a form of exchange for the goods and services bought and sold on the platform.

  2. Unit of Account: Fungibility, or the ability to have Units of Value that are indistinguishable from each other for legal purposes.$^{22}$ As a basic function of money, Bitcoin provides for a unit of measurement for defining, recording, and comparing the value of goods and services in dollars, other digital currencies or alt-coins, and commodities. Each Bitcoin can be fractionally divided in up to 100,000,000 units, and each of these Bitcoin units is called a Satoshi: 1 Satoshi = 0.00000001 BTC.

  3. Store of Value: Scarcity is a fundamental consideration when determining whether a currency is a Store of Value. Bitcoin has leveraged Proof-of-Work, and has programmatically hardcoded the total amount of Bitcoin assets that will ever be in circulation, into its mathematically defined scarcity model. Bitcoin, as its fundamental principle, is scarce.

Discussing what it means for a currency to be a store of value is a difficult topic. Society's demand for the money we use as a store of value is why a currency will be considered a store of value. Without the societal demand, and the implicit collusion among the members of society, that the note we use to conduct trade with, is trusted as a legitimate form of value with which a transaction or debt can be satisfied, then the money supply would be nothing more than clams, metal, paper, or bits, and bytes. The misuse of this implied societal trust, has historically caused the hyper-inflation and ultimate collapse of many of the currencies that societies have implemented throughout the history of human economics. Bitcoin has solved the ability to misuse this trust by enforcing the fact that only 21 million BTC will ever be circulated.

The conversation to describe money is confusing because we are using a descriptive term of "money" to describe the three characteristics discussed above, without differentiating them from each other and without differentiating between our perceptions about how we practically use and implement these terms in our day to day lives as participants in the global economy.$^{22}$ Let's discuss this idea further by presenting a use-case for each example listed above, to better understand how we have decided to approach the definition of money, and from where Bitcoin derives its value as an asset class:

  1. Medium of Exchange (MOE): Visa is a great MOE but poor UofA and poor SOV.

  2. Unit of Account (UofA): The USD is a great U0fA and MOE but poor in SOV.

  3. Store of Value (SOV): Gold is great SOV but poor MOE.

Each one of these use-cases represents a perception we have about money, based on what we want to use our money for at any particular time, and for any financial decision we may be confronted with when thinking about money within these constraints. Taking these facts into account, and knowing that Bitcoin has been engineered to provide society with these financial properties of money within its very architecture, we must look at the evolution of money and where society is with Bitcoin's implementation today, in order to completely understand why Bitcoin ultimately derives its value from these considerations.

The Evolution of Money and its relationship to the value of a Bitcoin

Money is the foundational aspect of all commerce and as such, permeates every level of society and human life. The first forms of money created came in the form of precious metals.

"Metals represented for humanity an era when we used the materials we found around us in the world, and adapted them to the uses for which they fit, and soft metals like Gold were molded for our need for "money", while harder metals were used for spears."$^{22}$

Humanity's ability to manipulate the natural world around us, naturally meant that tradeoffs had to be made in order to adapt the world around us to suit our needs, while remaining subject to the limitations of what we could physically do with the materials we chose to use as money. Although some forms of money were not effective as a Medium of Exchange, they suited society's purpose as a Store of Value as in the case of gold, or vice versa.

"Historically, our implementations of money, have had to compromise with the world around us, and for centuries we have had to adapt our uses of money to the materials available to us, rather than adapting our chosen material to suit our specific need... until now."$^{22}$

When Satoshi Nakamoto unleashed Bitcoin onto the world in 2009, a fundamental shift occurred in our ability to manipulate the materials we use as money for our specific use-case.

"Today, we can engineer the fundamental properties of money."$^{22}$

Society can now customize the properties of money to adjust money to suit its needs as a Medium of Exchange, a Unit of Account, or as a Store of Value. Today, money can also be engineered to provide features and properties that economists never thought money could or would have. Today, money can now become an immutable and universal ledger of accounts also, as in the case of the blockchain ledger implemented by Bitcoin. Money can take the form of a Smart Contract that can autonomously mitigate counter-party risk as defined by the parties involved in a transaction as in the case of Ethereum. Taking into consideration that money is now a tool that can be used to engineer solutions to our economic problems, we now consider two problems as defined by the United Nations Development Program to think of a societal reason that we can use to answer why Bitcoin and other digital currencies have a value to society in terms of how we think about and use money.

The Value of a Bitcoin to Society

Target No. 1:

"By 2030, ensure that all men and women, in particular the poor and the vulnerable, have equal rights to economic resources, as well as access to basic services, ownership and control over land and other forms of property, inheritance, natural resources, appropriate new technology and financial services, including microfinance."$^{23}$

Target No. 2:

"Create sound policy frameworks at the national, regional and international levels, based on pro-poor and gender-sensitive development strategies, to support accelerated investment in poverty eradication actions."$^{23}$

The targets listed above are just two targets defined by the United Nations and its Sustainable Development Goals for the eradication of poverty globally to help reduce the amount of people worldwide who subsist on less than two dollars per day. Taking into consideration the engineered nature of Bitcoin and other digital currencies, and the ability they have displayed to solve the problems defined by the United Nations as mentioned above, the authors of this research posit that the evidence discussed presents a socio-economic argument to explain why Bitcoin is valuable to society and why Bitcoin has a monetary value to society in general. In presenting the sociological and economic context of Bitcoin as a new asset class, an objective of this study is to engender a better understanding of the statistical outcomes measured in this work using the Python programming language as the tool used to quantify and analyze the data gathered to build a machine learning model to better present the conclusions of this research.

Machine Learning

Feature Engineering

This research will implement a Machine Learning model using a Multivariable Linear Regression Model to describe the price of Bitcoin using Daily data as made available by Quandl.$^{18}$ To build the model, the data used contains daily observations of the Bitcoin network's conditions such as its difficulty rate, its hash rate, its market capitalization, and as shown above, its rolling cost per transaction to name a few. Also taken into consideration is the external data of other asset classes like gold, for example, which will serve to juxtapose the price of Bitcoin against another Store of Value. Using these features to model the price of Bitcoin, the next step is to proceed within cleaning, or preprocessing the data to ensure that the data is aligned and that no missing, or null values existing in the data set. Following the preprocessing of the data set, the selected model is trained, and throughout the learning process data is feed into the model using a learning rate that will determine how well the model will perform under evaluation.

Furthermore, a statistical testing process will be performed to find the features and independent variables that can explain, and describe the price of Bitcoin within a reasonable degree of accuracy and precision. In determining the feature set that impacts and describes the price of Bitcoin, this work will attempt to determine a reasonable set of conclusions and answers for the following questions addressed in the arguments presented in the introduction of this paper:

  • If Bitcoin provides an outlet for the poor to participate in the global economy by enabling people in these communities to make transactions on a blockchain without policy constraints, or trade barriers, to allow those in secluded communities to access a larger global economic network using these tools to conduct trade with digital assets, then what types of solutions can be built to enable the practical execution of these technologies on a local level?

  • Can this research explain the historical context of Bitcoin's adoption, and its emerging status as an asset class, when juxtaposed against the time-series data of a basket of global assets and Bitcoin network and Google Search Trends data, to test against the claim that Bitcoin is the new "Gold Standard" in a perpetually expanding digital economy?

  • As Bitcoin gets closer to the validation of the final reward block to be mined, and the network stops rewarding miners for validating the transactions on the blockchain with Bitcoin, will the price decrease and force miners to increase the transaction fees associated to mining, which can lead to parity in terms of the transaction costs associated with traditional payment networks?

  • What will the machine learning model we define tell us about the price of Bitcoin when transaction costs are increased or decreased?

  • What features and predictors in our model will impact the price of Bitcoin the most?

This model relies on a number of imported Python machine learning libraries which are listed below:

In [2]:
import math
from IPython import display
import numpy as np
import pandas as pd
import seaborn as sb
import statsmodels.api as sm
from scipy import stats
from scipy.stats import pearsonr
from itertools import product
import statsmodels.graphics.regressionplots as smplt
from statsmodels.stats.outliers_influence import variance_inflation_factor
from mpl_toolkits.mplot3d import Axes3D
from patsy import dmatrices
from sklearn import metrics
import tensorflow as tf
from tensorflow.python.data import Dataset
from matplotlib import cm
from matplotlib import gridspec
from matplotlib import pyplot as plt
from pytrends.request import TrendReq
import datetime as dt

# Configure verbosity
tf.logging.set_verbosity(tf.logging.ERROR)

In the process of converting the data into a series of indicators, or vectors of scalar values for the analysis and creation of a model to describe Bitcoin to test the declared claims and hypotheses, below is a mathematical definition to describe this research's objective with the following implementation of the Python language and its associated tools and imports like TensorFlow.

Each feature $X$ will provide the placeholder needed to train a Multiple Linear Regression model that will enable this research to better describe the price of Bitcoin and what externalities and variables impact its price the most.

We have $X$, $m$ days of data. Each day (each row) has $n$ features. Therefore:

$$ \left[\begin{array}{cccc} x^{(1)}_{1} & x^{(1)}_{2} & \ldots & x^{(1)}_{n} \\ x^{(2)}_{1} & x^{(2)}_{2} & \ldots & x^{(2)}_{n} \\ & & \vdots \\ x^{(m)}_{1} & x^{(m)}_{2} & \ldots & x^{(m)}_{n} \end{array}\right] \times \left[\begin{array}{c} w_1 \\ w_2 \\ \vdots \\ w_n \end{array}\right] + b = \left[\begin{array}{c} y_1 \\ y_2 \\ \vdots \\ v_n \end{array}\right] $$

In [3]:
# Capture historical BTC price data
# Data showing the USD market price from Mt.gox
price = ql.get("BCHAIN/MKPRU", collapse="daily")
price = price.rename(columns={"Value": "price"})

# Fill missing values
price = price.fillna(method='bfill')
price = price.fillna(method='pad')

# Select Sample
price = price.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
price.plot(title='BTC Market Price', figsize=(18,10))
Out[3]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e8115e9b0>
In [4]:
# Bitcoin Difficulty
# Difficulty is a measure of how difficult it is to find a hash below a given target.
diffy = ql.get("BCHAIN/DIFF", collapse="daily")

# Fill missing values
diffy = diffy.fillna(method='bfill')
diffy = diffy.fillna(method='pad')

# Select Sample
diffy = diffy.rename(columns={"Value": "diffy"})
diffy = diffy.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
diffy.plot(title='Network Difficulty', figsize=(18,10))
Out[4]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e81288da0>
In [105]:
# Bitcoin Average Block Size
# The Average block size in MB
avbls = ql.get("BCHAIN/AVBLS", collapse="daily")

# Fill missing values
avbls = avbls.fillna(method='bfill')
avbls = avbls.fillna(method='pad')

# Select Sample
avbls = avbls.rename(columns={"Value": "avbls"})
avbls = avbls.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
avbls.plot(title='Average Block Size', figsize=(18,10), logy=True)
Out[105]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e9a7f4d68>
In [6]:
# Bitcoin Median Transaction Confirmation Time
# The Daily Median time take for transactions to be accepted into a block.
atrct = ql.get("BCHAIN/ATRCT", collapse="daily")

# Fill missing values
atrct = atrct.fillna(method='bfill')
atrct = atrct.fillna(method='pad')

# Select Sample
atrct = atrct.rename(columns={"Value": "atrct"})
atrct = atrct.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
atrct.plot(title='Median Confirmation Time', figsize=(18,10))
Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e8361ae10>
In [7]:
atrct.describe()
Out[7]:
atrct
count 2901.000000
mean 8.176092
std 4.621553
min 0.000000
25% 6.800000
50% 8.183333
75% 10.700000
max 47.733333

Bitcoin Transaction Confirmation Time

Transaction Confirmation Time is a metric used to argue of the inefficiencies of Bitcoin and its ability to scale to meet the demands met by today's existing payment networks.$^{24}$ As it stands, the average confirmation time for a transaction to be confirmed as an immutable entry to Bitcoin's distributed ledger is at ~8.17 minutes (See mean statistic above). It is recommended that a merchant wait for the confirmation of six blocks to be confirmed before delivering any goods to the buyer to ensure that the funds are legitimate.$^{25}$ This implies that an immutable entry is confirmed by the ledger in an average time of ~49.02 minutes before it becomes an immutable entry to the Bitcoin blockchain. There are those who argue that this is too long to wait when compared to a traditional payment network that will authorize a transaction in seconds.$^{24}$

Conversely, there are those who argue that when accepting a credit card via a traditional payment network, the transaction is in fact, not immutable and is subject to change, even after the settlement of the transaction. A transaction accepted by a merchant via a payment network like Visa or MasterCard, carries the risk of charge back at 30 days after a merchant has delivered their goods or services to the buyer.$^{26}$ There are payment industry initiatives that are working on implementations of RSA encryption techniques that leverage public-key cryptography as published by EMVCo standards to shift the transaction liability away from the merchant.$^{26}$ Inherent to its architecture however, when using Bitcoin, all sales are final, as it would be when dealing with cash or gold, for example. This is not to say that there are no protections for consumers using the network, being the only person with access to your RSA private-key is the protection offered to buyers that Bitcoin offers by leveraging RSA public-key encryption. When confirming a transaction as an immutable entry to the distributed ledger persisted by the Bitcoin blockchain in less than one hour, the Bitcoin network appears to be living up to its claim that it is the first implementation of a distributed payment network that provides a solution that eliminates the transactionary risk that merchants carry today that can inadvertently make it difficult to conduct international trade across borders due to cash flow constraints and risk aversion on the part of a seller to a transaction.${1}$

Bitcoin's ability to confirm a transaction to an immutable ledger in under an hour, without having an international merchant incur any Cross Border Fees, compared to 30 days, at its best when using a traditional payment network, appears to represent a practical commercial example for the innovation and disruption created by the Bitcoin ledger and serves to highlight another component of the asset that appears to add value to buyers and sellers in a distributed and global economy. Removing the transactionary risk which adds friction to the payment environment, and which may contribute to a flatter economic growth rate due a merchant's aversion to accept transactionary risk across borders, seems to provide another implicit example of the value that Bitcoin may add to the global economy, and how it may contribute to the solution of the economic problems defined by the World Bank that lead to the poverty of billions of people.

By eliminating the transactionary risk, poor people with less resources, are now enabled to reach out and trade with the rest of the global community, without having to fear that there will be a charge back as a seller, and a loss of investment due to an illegitimate buyer using illicit funds or a stolen credit card. After 8 minutes the merchant's Bitcoin transaction is confirmed, and it becomes an immutable blockchain entry, and the poor merchant can enjoy its profits without fear or risk of loss. The authors of this work question the practicality of scaling to meet the global payments demand of these merchants without the ability to confirm transactions within seconds instead of minutes. Due to the confirmation time constraints intrinsic to Bitcoin, in that its network currently processes 7 transactions per second with a 1MB block size limit. Comparatively, the Visa payments network achieved 47,000 peak transactions per second on its network during the 2013 holiday season.$^{27}$ If Bitcoin were to scale to meet a similar transaction demand under its current design paradigm, it could potentially result in the collapse of the network's processing capabilities, albeit temporarily. In a less extreme case, scaling to meet a large demand of transactions would require that a large company with an extensive cloud computing infrastructure, step in to meet the distributed computing demands of the network, therefore resulting in the centralization of the Bitcoin nodes on the network and thus defeating the network decentralization attributes that are touted as a security benefit of Bitcoin and its blockchain ledger, and it would therefore make it more susceptible to attack. There is data to suggest that Microsoft, and other large pools of consolidated computing power are in fact centralizing the nodes, which serve to decentralize the network and provide its security, to find a commercially practical solution to the problem of slow confirmation times and in essence, trading off distributed network security for transaction speed.$^{28}$.

"For bitcoin to succeed, it requires confidence that if it were to become extremely popular, its current advantages stemming from decentralization will continue to exist. In order for people today to believe that Bitcoin will work tomorrow, Bitcoin needs to resolve the issue of block size centralization effects; large blocks implicitly create trusted custodians and significantly higher fees."$^{29}$

Since Bitcoin technology development is governed by a strong Opensource community, there is no doubt in the minds of the researchers of this paper that there will be continued advancement and innovation made in terms of the Bitcoin network's scalability and ability to handle the global demand of transactions that appear to be causing an impact on the current price equilibrium of Bitcoin. Whether the solution is the consolidation of nodes under a more central infrastructure provided by the likes of a cloud computing leaders like Amazon or Microsoft, or whether the answer comes in the form of an innovative solution developed by the Bitcoin Opensource community such as the Bitcoin Lightning Network$^{29}$, Bitcoin appears to remain a contender in the world of digital currencies and payments for the foreseeable future. As the authors of this paper, we suspect that the average confirmation time metric will ultimately be able to tell us more about the value of Bitcoin as we develop the model described further throughout our description of this work.

In [8]:
# Bitcoin Hash Rate
# The estimated number of giga hashes per second 
# (billions of hashes per second) the bitcoin network is performing.
hrate = ql.get("BCHAIN/HRATE", collapse="daily")

# Fill missing values
hrate = hrate.fillna(method='bfill')
hrate = hrate.fillna(method='pad')

# Select Sample
hrate = hrate.rename(columns={"Value": "hrate"})
hrate = hrate.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
hrate.plot(title='Network Hash Rate', figsize=(18,10), logy=True)
Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e835d7b38>
In [9]:
# Bitcoin Cost % of Transaction Volume
# Data showing miners revenue as as percentage of the transaction volume.
cptrv = ql.get("BCHAIN/CPTRV", collapse="daily")

# Fill missing values
cptrv = cptrv.fillna(method='bfill')
cptrv = cptrv.fillna(method='pad')

# Select Sample
cptrv = cptrv.rename(columns={"Value": "cptrv"})
cptrv = cptrv.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
cptrv.plot(title='Cost % of Transaction Volume', figsize=(18,10), logy=True)
Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e8397a438>
In [10]:
cptrv.tail(1000).describe()
Out[10]:
cptrv
count 1000.000000
mean 1.159520
std 0.549162
min 0.324599
25% 0.784436
50% 1.020000
75% 1.391780
max 4.368342

Transaction Cost %

The data suggests that as the network scales to confirm more transactions, which in turn adds more value to the network, that the cost of transactions initiated on the network decreases as the value of a Bitcoin increases. Currently the average transaction cost is at ~1.15% or about 68% cheaper than doing business on a traditional payments network. The expectation of this research is to find a relationship between transactions costs and the price of Bitcoin as they appear to be a factor impact the increase in the adoption and value of Bitcoin.

In [11]:
# Bitcoin Estimated Transaction Volume
# Similar to the total output volume with the addition of an algorithm 
# which attempts to remove change from the total value. This may be a more 
# accurate reflection of the true transaction volume.
etrav = ql.get("BCHAIN/ETRAV", collapse="daily")

# Fill missing values
etrav = etrav.fillna(method='bfill')
etrav = etrav.fillna(method='pad')

# Select Sample
etrav = etrav.rename(columns={"Value": "etrav"})
etrav = etrav.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
etrav.plot(title='Est. Transaction Volume', figsize=(18,10), logy=True)
Out[11]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e858d8b38>
In [12]:
# Bitcoin Total Output Volume
# The total value of all transaction outputs per day. This includes coins 
# which were returned to the sender as change.
toutv = ql.get("BCHAIN/TOUTV", collapse="daily")

# Fill missing values
toutv = toutv.fillna(method='bfill')
toutv = toutv.fillna(method='pad')

# Select Sample
toutv = toutv.rename(columns={"Value": "toutv"})
toutv = toutv.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
toutv.plot(title='Total Output Volume', figsize=(18,10), logy=True)
Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e86300a20>
In [13]:
# Bitcoin Number of Transactions per Block
# The average number of transactions per block.
ntrbl = ql.get("BCHAIN/NTRBL", collapse="daily")

# Fill missing values
ntrbl = ntrbl.fillna(method='bfill')
ntrbl = ntrbl.fillna(method='pad')

# Select Sample
ntrbl = ntrbl.rename(columns={"Value": "ntrbl"})
ntrbl = ntrbl.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
ntrbl.plot(title='Number of Transactions per Block', figsize=(18,10), logy=True)
Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e85d31cc0>
In [14]:
# Bitcoin Number of Unique Bitcoin Addresses Used
# Number of unique bitcoin addresses used per day.
naddu = ql.get("BCHAIN/NADDU", collapse="daily")

# Fill missing values
naddu = naddu.fillna(method='bfill')
naddu = naddu.fillna(method='pad')

# Select Sample
naddu = naddu.rename(columns={"Value": "naddu"})
naddu = naddu.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
naddu.plot(title='Number of Unique Bitcoin Addresses Used', figsize=(18,10), logy=True)
Out[14]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e862e6c88>
In [15]:
# Bitcoin Number of Transactions Excluding Popular Addresses
# Data showing the total number of unique bitcoin transactions per day excluding those 
# which involve any of the top 100 most popular addresses popular addresses.
ntrep = ql.get("BCHAIN/NTREP", collapse="daily")

# Fill missing values
ntrep = ntrep.fillna(method='bfill')
ntrep = ntrep.fillna(method='pad')

# Select Sample
ntrep = ntrep.rename(columns={"Value": "ntrep"})
ntrep = ntrep.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
ntrep.plot(title='Number of Transactions Excluding Popular Addresses', figsize=(18,10), logy=True)
Out[15]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e8531e0b8>
In [16]:
# Bitcoin Total Number of Transactions
# Total number of unique bitcoin transactions per day.
ntrat = ql.get("BCHAIN/NTRAT", collapse="daily")

# Fill missing values
ntrat = ntrat.fillna(method='bfill')
ntrat = ntrat.fillna(method='pad')

# Select Sample
ntrat = ntrat.rename(columns={"Value": "ntrat"})
ntrat = ntrat.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
ntrat.plot(title='Total Number of Transactions', figsize=(18,10), logy=True)
Out[16]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e87b14a20>
In [17]:
# Bitcoin Number of Transactions
# Total number of unique bitcoin transactions per day.
ntran = ql.get("BCHAIN/NTRAN", collapse="daily")

# Fill missing values
ntran = ntran.fillna(method='bfill')
ntran = ntran.fillna(method='pad')

# Select Sample
ntran = ntran.rename(columns={"Value": "ntran"})
ntran = ntran.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
ntran.plot(title='Number of Transactions', figsize=(18,10), logy=True)
Out[17]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e87ca8b70>
In [18]:
# Bitcoin Total Transaction Fees
# Data showing the total BTC value of transaction fees miners earn per day.
trfee = ql.get("BCHAIN/TRFEE", collapse="daily")

# Fill missing values
trfee = trfee.fillna(method='bfill')
trfee = trfee.fillna(method='pad')

# Select Sample
trfee = trfee.rename(columns={"Value": "trfee"})
trfee = trfee.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
trfee.plot(title='Total Transaction Fees', figsize=(18,10), logy=True)
Out[18]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e880816a0>
In [19]:
# Total Bitcoins
# Data showing the historical total number of bitcoins which have been mined.
totbc = ql.get("BCHAIN/TOTBC", collapse="daily")

# Fill missing values
totbc = totbc.fillna(method='bfill')
totbc = totbc.fillna(method='pad')

# Select Sample
totbc = totbc.rename(columns={"Value": "totbc"})
totbc = totbc.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
totbc.plot(title='Total Bitcoins', figsize=(18,10), logy=True)
Out[19]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e8732ca90>
In [20]:
# Bitcoin Miners Revenue
# Historical data showing: 
# (number of bitcoins mined per day + transaction fees) * market price.
mirev = ql.get("BCHAIN/MIREV", collapse="daily")

# Fill missing values
mirev = mirev.fillna(method='bfill')
mirev = mirev.fillna(method='pad')

# Select Sample
mirev = mirev.rename(columns={"Value": "mirev"})
mirev = mirev.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
mirev.plot(title='Miners Revenue', figsize=(18,10), logy=True)
Out[20]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e88d05a20>
In [21]:
# Bitcoin Cost Per Transaction
# Data showing miners revenue divided by the number of transactions.
cptra = ql.get("BCHAIN/CPTRA", collapse="daily")

# Fill missing values
cptra = cptra.fillna(method='bfill')
cptra = cptra.fillna(method='pad')

# Select Sample
cptra = cptra.rename(columns={"Value": "cptra"})
cptra = cptra.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
cptra.plot(title='Cost Per Transaction', figsize=(18,10), logy=True)
Out[21]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e8a3d8780>
In [22]:
# Bitcoin USD Exchange Trade Volume
# Data showing the USD trade volume from the top exchanges.
trvou = ql.get("BCHAIN/TRVOU", collapse="daily")

# Fill missing values
trvou = trvou.fillna(method='bfill')
trvou = trvou.fillna(method='pad')

# Select Sample
trvou = trvou.rename(columns={"Value": "trvou"})
trvou = trvou.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
trvou.plot(title='USD Exchange Trade Volume', figsize=(18,10), logy=True)
Out[22]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e89db1ba8>
In [23]:
# Bitcoin Estimated Transaction Volume USD
# Similar to the total output volume with the addition of an algorithm 
# which attempts to remove change from the total value. This may be a 
# more accurate reflection of the true transaction volume.
etrvu = ql.get("BCHAIN/ETRVU", collapse="daily")

# Fill missing values
etrvu = etrvu.fillna(method='bfill')
etrvu = etrvu.fillna(method='pad')

# Select Sample
etrvu = etrvu.rename(columns={"Value": "etrvu"})
etrvu = etrvu.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
etrvu.plot(title='Est. Transaction Volume USD', figsize=(18,10), logy=True)
Out[23]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e8a57cf28>
In [24]:
# Bitcoin Total Transaction Fees USD
# Data showing the total BTC value of transaction fees miners earn per day in USD.
trfus = ql.get("BCHAIN/TRFUS", collapse="daily")

# Fill missing values
trfus = trfus.fillna(method='bfill')
trfus = trfus.fillna(method='pad')

# Select Sample
trfus = trfus.rename(columns={"Value": "trfus"})
trfus = trfus.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
trfus.plot(title='Total Transaction Fees USD', figsize=(18,10), logy=True)
Out[24]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e8c56c278>
In [25]:
# Bitcoin Market Capitalization
# Data showing the total number of bitcoins in circulation at the market price in USD.
mktcp = ql.get("BCHAIN/MKTCP", collapse="daily")

# Fill missing values
mktcp = mktcp.fillna(method='bfill')
mktcp = mktcp.fillna(method='pad')

# Select Sample
mktcp = mktcp.rename(columns={"Value": "mktcp"})
mktcp = mktcp.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
mktcp.plot(title='BTC Market Capitalization', figsize=(18,10), logy=True)
Out[25]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e8a5e5390>
In [26]:
# Bitcoin api.blockchain Size
# The total size of all block headers and transactions. 
# Not including database indexes.
blchs = ql.get("BCHAIN/BLCHS", collapse="daily")

# Fill missing values
blchs = blchs.fillna(method='bfill')
blchs = blchs.fillna(method='pad')

# Select Sample
blchs = blchs.rename(columns={"Value": "blchs"})
blchs = blchs.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
blchs.plot(title='BTC api.blockchain Size', figsize=(18,10), logy=True)
Out[26]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e8c442ac8>
In [27]:
# Bitcoin My Wallet Number of Transaction Per Day
# Number of transactions made by MyWallet Users per day.
mwntd = ql.get("BCHAIN/MWNTD", collapse="daily")

# Fill missing values
mwntd = mwntd.fillna(method='bfill')
mwntd = mwntd.fillna(method='pad')

# Select Sample
mwntd = mwntd.rename(columns={"Value": "mwntd"})
mwntd = mwntd.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
mwntd.plot(title='BTC Wallet Transactions Per Day', figsize=(18,10), logy=True)
Out[27]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e8d186c88>
In [28]:
# Bitcoin My Wallet Number of Users
# Number of wallets hosts using our MyWallet Service.
mwnus = ql.get("BCHAIN/MWNUS", collapse="daily")

# Fill missing values
mwnus = mwnus.fillna(method='bfill')
mwnus = mwnus.fillna(method='pad')

# Select Sample
mwnus = mwnus.rename(columns={"Value": "mwnus"})
mwnus = mwnus.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
mwnus.plot(title='BTC Wallet Number of Users', figsize=(18,10), logy=True)
Out[28]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e8d934588>
In [29]:
# Bitcoin My Wallet Transaction Volume
# 24hr Transaction Volume of our MyWallet service.
mwtrv = ql.get("BCHAIN/MWTRV", collapse="daily")

# Fill missing values
mwtrv = mwtrv.fillna(method='bfill')
mwtrv = mwtrv.fillna(method='pad')

# Select Sample
mwtrv = mwtrv.rename(columns={"Value": "mwtrv"})
mwtrv = mwtrv.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
mwtrv.plot(title='BTC Wallet Transaction Volume', figsize=(18,10), logy=True)
Out[29]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e8d951320>
In [30]:
# Gold Data
# Historical Futures Prices: Gold Futures, Continuous Contract 
# 4. Non-adjusted price based on spot-month continuous contract calculations. Raw data from CME.
#############################################
gld = ql.get("CHRIS/CME_GC4", start_date=start, end_date=end, collapse="daily")

# Fill missing values
gld = gld.fillna(method='bfill')
gld = gld.fillna(method='pad')

# Select Sample
gld = gld.rename(columns={"Last": "gld"})
gld.gld.plot(title='Gold', figsize=(18,10), logy=True)
Out[30]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e8d0130b8>
In [31]:
# Crude Oil Data
# Historical Futures Prices: Crude Oil Futures, Continuous Contract 
# 2. Non-adjusted price based on spot-month continuous contract calculations. Raw data from CME.
#############################################
oil = ql.get("CHRIS/CME_CL2", start_date=start, end_date=end, collapse="daily")

# Fill missing values
oil = oil.fillna(method='bfill')
oil = oil.fillna(method='pad')

# Select Sample
oil = oil.rename(columns={"Last": "crude"})
oil.crude.plot(title='Crude Oil', figsize=(18,10), logy=True)
Out[31]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e8ae599b0>
In [32]:
# 10-Year Bond Yield Data
# U.S. Treasury Bond Futures, Continuous Contract 
# 1. Non-adjusted price based on spot-month continuous contract calculations. Raw data from CME.
#############################################
tenYr = ql.get("CHRIS/CME_US1", start_date=start, end_date=end, collapse="daily")

# Fill missing values
tenYr = tenYr.fillna(method='bfill')
tenYr = tenYr.fillna(method='pad')

# Select Sample
tenYr = tenYr.rename(columns={"Last": "bond"})
tenYr.bond.plot(title='Ten-Year Bond Yields', figsize=(18,10), logy=True)
Out[32]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e8d04f9e8>
In [33]:
# USD Index Futures Data
# Historical Futures Prices: US Dollar Index Futures, Continuous Contract 
# 1. Non-adjusted price based on spot-month continuous contract calculations. Raw data from ICE.
#############################################
usd = ql.get("CHRIS/ICE_DX1", start_date=start, end_date=end, collapse="daily")

# Fill missing values
usd = usd.fillna(method='bfill')
usd = usd.fillna(method='pad')

# Select Sample
usd = usd.rename(columns={"Settle": "usd"})
usd.usd.plot(title='USD Price Index Futures', figsize=(18,10), logy=True)
Out[33]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e8f702eb8>
In [34]:
# S&P500 Futures Data
# Historical Futures Prices: E-mini S&P 500 Futures, Continuous Contract 
# 1. Non-adjusted price based on spot-month continuous contract calculations. Raw data from CME.
#############################################
sp500 = ql.get("CHRIS/CME_ES1", start_date=start, end_date=end, collapse="daily")

# Fill missing values
sp500 = sp500.fillna(method='bfill')
sp500 = sp500.fillna(method='pad')

# Select Sample
sp500 = sp500.rename(columns={"Last": "sp500"})
sp500.sp500.plot(title='S&P500 Futures', figsize=(18,10), logy=True)
Out[34]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e8f6a2ac8>
A Proxy for Market Demand

After having obtained asset pricing data that is inherent to the operation of the distributed ledger implemented by the Bitcoin Foundation, along with asset pricing data for other asset classes like oil, gold, S&P500, USD Price Index, and 10-Year Bonds, this research also takes into consideration Market Demand in an attempt to model the behavior of the participants involved in the market by using GoogleTrends$^{30}$ search data as its derivative, in other words, as a proxy for market demand. Using search trend data provided by Google, the model implemented in this research intends to model microeconomic price equilibrium data by measuring a component of supply and demand, essentially quantifying the daily search requests of those who are using Google to actively seek out more information on Bitcoin and how to obtain it. By gaining insight into the nature of the market demand as measured by the search requests of the global population, the authors of this research expect to leverage and find a close relationship between the price of Bitcoin and the amount of search requests made for Bitcoin.

In [35]:
#### connect to google
_pytrends = TrendReq(hl='en-US', tz=360)

#### build the playload
_kw_list = ["bitcoin"]  
_cat = 0
_geo = ''
_gprop = '' 
In [36]:
# dates can be formated as `2017-12-07 2018-01-07`,`today 3-m`, or `today 5-y` check trends.google.com's url
_date_fmt = '%Y-%m-%d'

# Create and format a new list of datetime objects using list of strings representing dates needed 
_start_date, _end_date = map(lambda x : dt.datetime.strptime(x, _date_fmt)
                           , [start, end])

_end_date - _start_date
Out[36]:
datetime.timedelta(2912)
In [37]:
### Build an array of 60d periods to retreive google trend data with a one day resolution
# _60d_periods stores the total amount of 60 day periods there are between start & end date 
_60d_periods = math.ceil( (_end_date - _start_date) / dt.timedelta(days=90) )

_60d_periods
Out[37]:
33
In [38]:
# _tmp_range is a list of dates separated by 60d.  We need one more than the number of _60_periods.  
# if _end_date is in the future google returns the most recent data
_tmp_range = pd.date_range(start= _start_date, periods= _60d_periods + 1, freq= '90D')
In [39]:
# making the list of `_start_date _end_date`, strf separated by a space
# using the _tmp_range values join the 60d periods into a list of 60d periods
_rolling_dates = [ ' '.join(map(lambda x : x.strftime(_date_fmt), 
                                [_tmp_range[i], _tmp_range[i+1] ])
                            )
                    for i in range(len(_tmp_range)-1) ]
In [40]:
# initialization of the major data frame _df_trends
# _date will contains our last playload argument
_dates = _rolling_dates[0]
_pytrends.build_payload(_kw_list, cat=_cat, timeframe=_dates, geo=_geo, gprop=_gprop)
_df_trends = _pytrends.interest_over_time()

for _dates in _rolling_dates[1:] :
    # we need to normalize data before concatanation
    
    _common_date = _dates.split(' ')[0]
    
    _pytrends.build_payload(_kw_list, cat=_cat, timeframe=_dates, geo=_geo, gprop=_gprop)
    _tmp_df = _pytrends.interest_over_time()
        
    _multiplication_factor = _df_trends.loc[_common_date] / _tmp_df.loc[_common_date]
    
    # _df_trends contains the normalized Trends data
    _df_trends = (pd.concat([_df_trends, 
                             (_tmp_df[1:] * _multiplication_factor)])
                  .drop(labels = 'isPartial', axis = 1)  # isPartial usefull ?
                  .resample('D', closed='right').bfill()  # making sure that we have one value per day.
                 )
In [41]:
gglTrnd = _df_trends

gglTrnd = pd.to_numeric(gglTrnd['bitcoin'])
In [42]:
# Create a Google Trend Object
# Create 3 periods to normalize against
totalTrend = TrendReq(hl='en-US', tz=360)

# Declare a var to store the search term
#### build the playload
kw_list = ["bitcoin"]  
_cat = 0
_geo = ''
_gprop = ''
time = start + ' ' + end

# Build payload request to get data from Google trends
# timeframe='2009-01-03 2018-05-26'
# timeframe='2009-01-03 2013-06-26'
# timeframe='today 5-y'
totalTrend.build_payload(kw_list, cat=_cat, timeframe=time, geo=_geo, gprop=_gprop)

# Get interest over time
# Capture Monthly Data for use in Normalization against Weekly
totalTrend = totalTrend.interest_over_time()
In [43]:
# Plot the Interest
totalTrend.plot(title='Google Trends Monthly Data Points', figsize=(20,10))
Out[43]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e8d8b58d0>

Below is the normalized trend using Google search data to display the featured information on a daily frequency with the program written above.$^{31}$

In [44]:
gglTrnd.plot(title='Normalized Google Trends Daily Data', figsize=(20,10))#, logy=True)
Out[44]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e90842780>

PreProcessing & Descriptive Analysis

Proceeding further, and using Python, we derive data to measure the performance of the market on a Daily period by calculating the daily percentage return and we persist it to the set of all features for use later in this document to compare it to the significant features found through statistical testing.

In [45]:
# Move Data between past and future to compare Gains vs Losses
price['pshift'] = price.price.shift() # Default periods = 1
price['plag'] = price.price.shift(periods=-1) # Offset periods = -1

# Calculate one period percentage change (xt/xt-1)
price['pchange'] = price.price.div(price.pshift)

# Calculate return
price['ret'] = price.pchange.sub(1).mul(100)

price[['price', 'pshift', 'pchange', 'ret']].tail(3)
Out[45]:
price pshift pchange ret
Date
2018-08-17 6342.629231 6362.676923 0.996849 -0.315083
2018-08-18 6476.900000 6342.629231 1.021170 2.116958
2018-08-19 6436.720833 6476.900000 0.993797 -0.620346
In [46]:
# concatenate all variables into one set of all feature
frames = [price['price'], price['ret'], diffy, mwntd, mwnus, mwtrv, avbls, blchs,
          atrct, hrate, cptrv, etrav, toutv, ntrbl, naddu,
          ntrep, ntrat, ntran, trfee, totbc, mirev, cptra,
          trvou, etrvu, trfus, mktcp, gld.gld, oil.crude, tenYr.bond, usd.usd, sp500.sp500, gglTrnd]
In [47]:
btcData = pd.concat(frames, axis=1)
btcData.shape
Out[47]:
(2914, 32)
In [48]:
# Describe the complete set of all features
btcData.describe()
Out[48]:
price ret diffy mwntd mwnus mwtrv avbls blchs atrct hrate ... trvou etrvu trfus mktcp gld crude bond usd sp500 bitcoin
count 2913.000000 2912.000000 2.913000e+03 2913.000000 2.889000e+03 2.913000e+03 2913.000000 2913.000000 2901.000000 2.913000e+03 ... 2.913000e+03 2.913000e+03 2.913000e+03 2.913000e+03 1997.000000 2010.000000 2009.000000 2052.000000 2026.000000 2914.000000
mean 1383.031314 0.605613 4.522343e+11 30705.032269 5.507746e+06 1.146569e+05 0.401249 46198.306683 8.176092 3.447819e+06 ... 1.127787e+08 2.698727e+08 2.902619e+05 2.269900e+10 1367.451377 74.788930 145.530783 87.363777 1867.691777 15436.298681
std 2928.310405 7.038008 1.124557e+12 29488.668629 7.504610e+06 2.712013e+05 0.362587 53780.736529 4.621553 8.611323e+06 ... 3.409114e+08 6.196482e+08 1.375379e+06 4.952617e+10 188.168884 23.244625 12.346814 8.295908 472.696861 33587.371972
min 0.060900 -64.628571 6.233870e+02 0.000000 2.000000e+00 0.000000e+00 0.000419 1.000000 0.000000 4.617321e-03 ... 5.131510e+01 5.560000e+02 0.000000e+00 2.392845e+05 1051.700000 28.330000 117.312500 73.107000 1045.000000 0.000000
25% 12.178870 -1.355300 2.440643e+06 222.000000 1.840500e+04 0.000000e+00 0.086244 2287.000000 6.800000 1.819875e+01 ... 5.121155e+05 2.378989e+06 2.672012e+02 1.146124e+08 1238.300000 50.842500 136.750000 80.135750 1398.812500 1545.241184
50% 292.870000 0.096843 2.384467e+10 25790.000000 2.011246e+06 6.139049e+04 0.252554 20873.000000 8.183333 1.908373e+05 ... 4.680273e+06 5.057706e+07 5.698838e+03 4.143624e+09 1309.500000 81.580000 145.281250 83.958000 1922.875000 5862.101509
75% 703.710000 1.880909 2.173755e+11 54852.000000 8.156634e+06 1.588788e+05 0.758401 79799.200000 10.700000 1.619256e+06 ... 3.141146e+07 1.831560e+08 4.003936e+04 1.024484e+10 1484.900000 96.405000 153.437500 95.227500 2137.687500 12208.162487
max 19498.683333 173.010920 6.389317e+12 122796.000000 2.781125e+07 8.484600e+06 1.179159 179361.207839 47.733333 5.399449e+07 ... 5.352016e+09 5.760245e+09 2.272484e+07 3.265254e+11 1893.600000 114.430000 177.093750 103.288000 2874.750000 469986.093891

8 rows × 32 columns

Price Performance Distribution

A Distribution Plot describing the frequency of "Down Days", days with a close lower than the previous day's closing price, vs "Up Days", or days with a close higher than the previous day's closing price. This feature will be used as a categorical label further along in the work to visualize the relationship of each feature against price according to its daily performance. The output implies that speculation in this market through an intermediary may prove to be difficult, as the resuls display that there are more days with a negative price performance than there are days with a positive price performance. When coupled with the leverage offered by exchanges to facilitate speculation, it would seem to appear that statistically, the average person's ability to turn a speculative profit on any given day is just as good as being left up to chance. From a speculative perspective, the distribution of price against the daily performance of Bitcoin does not provide any indication that the value of the market is favoring any given direction. Unfortunately the distribution plot below, when considering the daily performance, indicates that the market currently favors a negative performance and the probability of executing a trade that would result in financial Loss seems to be likely. We will continue to analyze the remaining data points to find any potential features that may indicate to us if the performance of the market may be significantly impacted by any of the predictors in the set of all features used in this paper.

In [49]:
# Truncate values before 2010-08-29 to eliminate
# infinite values generated from BTC startup
hprice = price.truncate(before=pd.Timestamp(start), after=pd.Timestamp(end))
In [50]:
# Distribution Plots
plt.figure(figsize = (20,10))

sb.distplot(hprice['pchange'][hprice['ret'] <= 0], bins=30, label = 'Market Down Days')

sb.distplot(hprice['pchange'][hprice['ret'] >= 0], bins=30, label = 'Market Up Days')

plt.legend()
plt.show()
c:\program files\python36\lib\site-packages\matplotlib\axes\_axes.py:6462: UserWarning: The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg.
  warnings.warn("The 'normed' kwarg is deprecated, and has been "
c:\program files\python36\lib\site-packages\matplotlib\axes\_axes.py:6462: UserWarning: The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg.
  warnings.warn("The 'normed' kwarg is deprecated, and has been "

Feature Distribution

Being useful to understand the distribution of the predictors in a linear regression model to find influential outliers or concentrated values, the process is continued with a plot of distributions for each of the features in the set of all features. From the plots below we can observe the distributions of our dependent and independent variables. A highly skewed dependent variable, in this case price, may be made more symmetric with a transformation to better approximate for the symmetry and homoscedasticity of residual values when fitting values to the regression model later. In the study of the independent variables however, and siding with the statistical argument that 'normality' is inapplicable in this use-case because the independent values in this study are taken as a fixed, rather than a random sampling of data, this research will avoid performing any transformations to better achieve a linear relationships with the response variable due to the complexities of working with the re-interpretation of coefficients because of the potential that it may lead to the overfitting of the data used in these tests when the model is trained to predict a specific outcome.$^{32}$

Because of the wide variation in units for each of the values in the set of data, the approach this research chooses is to standardize the regression inputs into z-scores that will be measured in a Pearson Correlation matrix to find the variables with the most statistical significance from a set of correlations when compared to the response variable, price. The goal of this study is to find the variables and their coefficients that impact the prediction of price with a reasonable degree of statistical significance without overfitting the values to the regression model. Moving forward, a Variance Inflation Factor will be calculated after the initial iteration of the regression model to attempt to determine any multicollinearity of the variables that may be impacting the model considering the extreme skewness of the distributions observed below.

In [51]:
fig = btcData.hist(bins=30,
                 color='steelblue',
                 edgecolor='black', linewidth=1.0,
                 xlabelsize=10, ylabelsize=10,
                 xrot=45, yrot=0,
                 figsize=(10,9),
                 grid=False)

plt.tight_layout(rect=(0, 0, 3, 3))   

Standardization of Regression Inputs

When solving for multiple regression, it can be argued that all variables, or predictors, both independent and dependent, should be standardized. A predictor, or feature can be standardized into a z-score by subtracting its mean from each of the values in the set of all values for the feature, and then dividing these new values by the standard deviation of the set of all values. Standardizing the predictors in a multiple regression yields standardized regression coefficients that eliminate the variable's scale of units to display the change in the dependent predictor measured in standard deviations to more easily compare the features against each other.

In [52]:
# Chosen features selected from predictors with highest Pearson Correlation study pvals
predictors = [diffy, mwntd, mwnus, mwtrv, avbls, blchs,
          atrct, hrate, cptrv, etrav, toutv, ntrbl, naddu,
          ntrep, ntrat, ntran, trfee, totbc, mirev, cptra,
          trvou, etrvu, trfus, mktcp, gld.gld, oil.crude, tenYr.bond, usd.usd, sp500.sp500, gglTrnd]

predictData = pd.concat(predictors, axis=1)

# Chosen features based on VIF Study 
featuresSelected = [cptrv, gld.gld]
featureData = pd.concat(featuresSelected, axis=1)

# Chosen Label to predict output
labels = [price['price']] 
labelData = pd.concat(labels, axis=1)
In [53]:
# Set chosen Features/Predictors
X = predictData.fillna(method='bfill')
X = X.fillna(method='pad')
In [54]:
# Set Label
y = labelData.fillna(method='bfill')
In [55]:
# Standardize predictors/features and calculate z-scores 
# for set of all values in each feature.
idxTrunc = 2913
zScoreX = stats.zscore(X)[:idxTrunc]

zScoreY = stats.zscore(y)
In [56]:
zScoreX = pd.DataFrame(zScoreX)
zScoreX.columns = ['diffy', 'mwntd', 'mwnus', 'mwtrv', 'avbls', 'blchs',
          'atrct', 'hrate', 'cptrv', 'etrav', 'toutv', 'ntrbl', 'naddu',
          'ntrep', 'ntrat', 'ntran', 'trfee', 'totbc', 'mirev', 'cptra',
          'trvou', 'etrvu', 'trfus', 'mktcp', 'gld', 'oil', 'bond', 'usd', 'sp500', 'demand']
zScoreX.tail(1)
#zScoreX.shape
Out[56]:
diffy mwntd mwnus mwtrv avbls blchs atrct hrate cptrv etrav ... trvou etrvu trfus mktcp gld oil bond usd sp500 demand
2912 5.254408 0.688275 2.939785 -0.397484 0.52071 2.473432 1.349198 5.472607 -0.17049 -0.577482 ... 0.743319 0.165933 -0.13404 1.77844 -0.904448 -0.410686 -0.069401 1.037088 2.075474 0.278315

1 rows × 30 columns

In [57]:
zScoreY = pd.DataFrame(zScoreY)
zScoreY.columns = ['price']
zScoreY.head(1)
Out[57]:
price
0 -0.472356

Spearman's Rank Correlation Studies

"Spearman's rank correlation coefficient is denoted as ϱs for a population parameter and as rs for a sample statistic. It is appropriate when one or both variables are skewed or ordinal1 and is robust when extreme values are present. For a correlation between variables x and y, the formula for calculating the sample Spearman's correlation coefficient is given by: " $^{33}$

$$ r^2 = 1 - \frac{6\cdot \sum_{i=1}^{n}{d_i^2}}{n\left ( n^2 -1 \right )} $$ Where ${d_i}$ is the difference in ranks for ${x}$ and ${y}$

In [58]:
# Implement Spearman's Rank Correlation Matrix
btcCorr = btcData.corr(method='spearman')
In [59]:
# Plot heatmap
plt.figure(figsize = (35,35))
sb.heatmap(btcCorr, annot = True, linewidths = .5, cmap = "coolwarm")
Out[59]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e9035b2e8>

1st Test of Feature Significance to Prevent Multicollinearity using Spearman Rank pVal < 0.01

In [60]:
# Implement function to output pvals & r
def compute_corr_and_p(df1, df2):
    corrs = pd.DataFrame(index=df1.columns, columns=df2.columns, dtype=np.float64)
    pvals = corrs.copy()
    # Implement spearman
    for i, j in product(df1.columns, df2.columns):
        corrs.loc[i,j], pvals.loc[i,j] = stats.spearmanr(df1[i], df2[j])
    return corrs, pvals

dfcorrs, dfpvals = compute_corr_and_p(zScoreX, zScoreY)
In [61]:
# Set the significance level for the correlation
sigLevel = .01

# Iterate over each p-value in the series and 
# select the significant variables
count = 0
nsList = []
print("Statistically Significant Variables")
print("===================================")
for pval in dfpvals.index:
    if(dfpvals['price'][pval] < sigLevel):
        print(str(dfpvals.index[count]) + ": " + str(dfpvals['price'][pval]))
        count = count + 1
    else:
        nsList.append(dfpvals.index[count])
        print(dfpvals.index[count] + ": " + "Not Significant")
        count = count + 1
        
Statistically Significant Variables
===================================
diffy: 0.0
mwntd: 0.0
mwnus: 0.0
mwtrv: 3.200003134946954e-78
avbls: 0.0
blchs: 0.0
atrct: 3.4003050425757886e-114
hrate: 0.0
cptrv: 0.0
etrav: 2.5113538003707207e-62
toutv: 1.4907716529883192e-139
ntrbl: 0.0
naddu: 0.0
ntrep: 0.0
ntrat: 0.0
ntran: 0.0
trfee: 0.0
totbc: 0.0
mirev: 0.0
cptra: 0.0
trvou: 0.0
etrvu: 0.0
trfus: 0.0
mktcp: 0.0
gld: 3.367358628158589e-210
oil: 1.0986535677117419e-166
bond: 5.530248483047855e-136
usd: 0.0
sp500: 0.0
demand: 0.0
In [62]:
nsList
Out[62]:
[]
In [63]:
# Drop all insignificant variables from feature set
zScoreXsig = zScoreX.drop(columns=nsList)

Statistical Testing

Using an Ordinary Least Squares Regression on a single independent feature, we can model the linear relationship between the predictor and the label with a straight line. Furthermore, we can extend this concept to a Multiple Regression, and we can fit a $p$-dimensional hyperplane to our $p$ predictors.

We can display the dimensionality of our feature set in a three-dimensional plot graphically to help in gaining new insight into the price of Bitcoin based on the fit of the predictors to the model described. We can mathematically define the multiple regression as a model that describes the response variable or price, as a weighted sum of the feature set such that:

$Price = \beta_0 + \beta_1 \times NetworkDifficulty + \beta_2 \times MyWalletNumUsers + \beta_3 \times BlockchanSize + ... + \beta_4 \times Gold + \beta_n \times MarketDemand $

After implementing the Multiple Linear Regression model as defined above, the data will be analyzed and the results of this work will be based on the output results defined below:

Definition of Result Elements

The left section of the 1st table displays basic information about the fit of the regression model

Element Description
Dep. Variable The variable used as the response variable in the model
Model The type of model implemented into the fit
Method How the parameters of the model were calculated
No. of Observations The number of observations (examples)
DF Residuals Degrees of freedom of the residuals. Number of observations - number of parameters
DF Model Number of parameters in the model (not including the constant term if present)

The right section of the 1st table displays information about the "goodness" of fit for the regression model

Element Description
R-squared The coefficient of determination. A statistical measure of how well the regression line approximates the real data points. This result allows you to determine how well the set of all features can describe the model.
Adj. R-squared The R-squared value is adjusted based on the number of observations and the degrees-of-freedom of the residuals
F-statistic A measure how significant the fit is. The mean squared error of the model divided by the mean squared error of the residuals
P(F-Statistic) The probability that you would get the above statistic, given the null hypothesis that they are unrelated
Log-Liklihood The log of the likelihood function.
AIC The Akaike Information Criterion. Adjusts the log-likelihood based on the number of observations and the complexity of the model.
BIC The Bayesian Information Criterion. Similar to the AIC, but has a higher penalty for models with more parameters.

The second table provides reporting data for each of the feature coefficients

Element Description
coef The estimated value of the coefficient
std err The basic standard error of the estimate of the coefficient. More sophisticated errors are also available.
t The t-statistic is a measure of how statistically significant the coefficient is. The greater the T, the more evidence you have that your metric is significantly different from average. A smaller T value is implies a metric is not significantly different from average.
P > [t] P-value that the null-hypothesis that the coefficient = 0 is true. If it is less than the confidence level, often 0.05, it indicates that there is a statistically significant relationship between the term and the response.
[95.0% Conf. Interval] The lower and upper values of the 95% confidence interval

The final section displays the statistical tests used by the model to assess the distribution of the residuals

Element Description
Skewness A measure of the symmetry of the data about the mean. Normally-distributed errors should be symmetrically distributed about the mean (equal amounts above and below the line).
Kurtosis A measure of the shape of the distribution. Compares the amount of data close to the mean with those far away from the mean (in the tails).
Omnibus D'Agostino's test. It provides a combined statistical test for the presence of skewness and kurtosis.
P(Omnibus) The Omnibus statistic turned into a probability
Jarque-Bera A different test of the skewness and kurtosis
P(Jarque-Bera) The above statistic turned into a probability
Durbin-Watson A test for the presence of autocorrelation (that the errors are not independent.) Often important in time-series analysis
Cond. No A test for multicollinearity (if in a fit with multiple parameters, the parameters are related with each other).

Multiple Linear Regression Model Implementation

In [64]:
## fit a OLS model with intercept on feature set
zScoreXsig = sm.add_constant(zScoreXsig)
In [65]:
est = sm.OLS(zScoreY, zScoreXsig, missing="drop").fit()
In [66]:
est.summary(alpha=.05)
Out[66]:
OLS Regression Results
Dep. Variable: price R-squared: 1.000
Model: OLS Adj. R-squared: 1.000
Method: Least Squares F-statistic: 1.415e+06
Date: Wed, 22 Aug 2018 Prob (F-statistic): 0.00
Time: 21:56:01 Log-Likelihood: 9845.6
No. Observations: 2913 AIC: -1.963e+04
Df Residuals: 2882 BIC: -1.944e+04
Df Model: 30
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const 0.0006 0.000 3.836 0.000 0.000 0.001
diffy 0.0305 0.002 12.507 0.000 0.026 0.035
mwntd 0.0020 0.000 5.931 0.000 0.001 0.003
mwnus -0.0596 0.008 -7.194 0.000 -0.076 -0.043
mwtrv -6.325e-05 0.000 -0.374 0.708 -0.000 0.000
avbls 0.0005 0.001 0.532 0.595 -0.001 0.002
blchs -0.1953 0.016 -11.887 0.000 -0.228 -0.163
atrct 0.0026 0.000 10.010 0.000 0.002 0.003
hrate -0.0327 0.002 -14.101 0.000 -0.037 -0.028
cptrv 0.0001 0.000 0.590 0.555 -0.000 0.001
etrav 0.0001 0.000 0.728 0.466 -0.000 0.000
toutv 2.454e-05 0.000 0.141 0.888 -0.000 0.000
ntrbl 0.0030 0.001 2.884 0.004 0.001 0.005
naddu 0.0086 0.002 5.349 0.000 0.005 0.012
ntrep 0.0014 0.002 0.804 0.421 -0.002 0.005
ntrat 0.2528 0.018 13.759 0.000 0.217 0.289
ntran -0.0041 0.002 -2.166 0.030 -0.008 -0.000
trfee -0.0024 0.001 -4.391 0.000 -0.003 -0.001
totbc -0.0062 0.001 -4.996 0.000 -0.009 -0.004
mirev 0.0281 0.003 10.001 0.000 0.023 0.034
cptra 0.0193 0.001 28.545 0.000 0.018 0.021
trvou -0.0015 0.000 -3.716 0.000 -0.002 -0.001
etrvu 0.0041 0.001 5.566 0.000 0.003 0.005
trfus -0.0049 0.001 -6.152 0.000 -0.006 -0.003
mktcp 0.9588 0.002 398.860 0.000 0.954 0.964
gld -0.0029 0.000 -6.251 0.000 -0.004 -0.002
oil -0.0005 0.001 -0.864 0.388 -0.002 0.001
bond 0.0019 0.001 3.842 0.000 0.001 0.003
usd -0.0093 0.001 -12.038 0.000 -0.011 -0.008
sp500 -0.0011 0.001 -0.817 0.414 -0.004 0.002
demand 0.0044 0.000 9.850 0.000 0.004 0.005
Omnibus: 2153.209 Durbin-Watson: 1.346
Prob(Omnibus): 0.000 Jarque-Bera (JB): 1346700.275
Skew: 2.232 Prob(JB): 0.00
Kurtosis: 108.240 Cond. No. 634.


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Variance Inflation Factors

Apply Variance Inflation Factor calculation to feature set to find any possible multicollinearity in the data set.

In [67]:
# Check for Variance Inflation Factors to check for Multi Collinearity
vif = pd.DataFrame()
vif["VIF Factor"] = [variance_inflation_factor(zScoreXsig.values, i) for i in range(zScoreXsig.shape[1])]
vif["features"] = zScoreXsig.columns
vif.round(2)
Out[67]:
VIF Factor features
0 1.00 const
1 249.41 diffy
2 5.06 mwntd
3 2907.29 mwnus
4 1.21 mwtrv
5 38.19 avbls
6 11438.64 blchs
7 2.81 atrct
8 225.59 hrate
9 2.71 cptrv
10 1.18 etrav
11 1.29 toutv
12 44.87 ntrbl
13 110.49 naddu
14 125.73 ntrep
15 14311.10 ntrat
16 155.30 ntran
17 12.59 trfee
18 64.59 totbc
19 335.14 mirev
20 19.35 cptra
21 7.02 trvou
22 22.54 etrvu
23 26.68 trfus
24 245.12 mktcp
25 9.18 gld
26 15.38 oil
27 10.86 bond
28 25.31 usd
29 80.74 sp500
30 8.37 demand

2nd Test of Feature Significance to Solve for Regression Over-Fit using VIF < 10

In [68]:
# Iterate over each VIF in the series and 
# select the significant variables with the 
# least amount of Variance Inflation
threshold = 10
count = 0
nsList = []
print("Detecting Multicollinearity using Variance Inflation Factors")
print("============================================================")
for viffactor in vif.index:
    if(vif['VIF Factor'][viffactor] <= threshold and est.pvalues[count] < 0.5):
        print(str(vif.features[count]) + ": " + str(vif['VIF Factor'][viffactor]))
        count = count + 1
    else:
        nsList.append(vif.features[count])
        print(str(vif.features[count]) + ": " + "Inflated Variance with Multicollinearity")
        count = count + 1
Detecting Multicollinearity using Variance Inflation Factors
============================================================
const: 1.0000091373803768
diffy: Inflated Variance with Multicollinearity
mwntd: 5.0626348001849415
mwnus: Inflated Variance with Multicollinearity
mwtrv: Inflated Variance with Multicollinearity
avbls: Inflated Variance with Multicollinearity
blchs: Inflated Variance with Multicollinearity
atrct: 2.809539221927989
hrate: Inflated Variance with Multicollinearity
cptrv: Inflated Variance with Multicollinearity
etrav: 1.1794072333673629
toutv: Inflated Variance with Multicollinearity
ntrbl: Inflated Variance with Multicollinearity
naddu: Inflated Variance with Multicollinearity
ntrep: Inflated Variance with Multicollinearity
ntrat: Inflated Variance with Multicollinearity
ntran: Inflated Variance with Multicollinearity
trfee: Inflated Variance with Multicollinearity
totbc: Inflated Variance with Multicollinearity
mirev: Inflated Variance with Multicollinearity
cptra: Inflated Variance with Multicollinearity
trvou: 7.01649889841808
etrvu: Inflated Variance with Multicollinearity
trfus: Inflated Variance with Multicollinearity
mktcp: Inflated Variance with Multicollinearity
gld: 9.178787829336802
oil: Inflated Variance with Multicollinearity
bond: Inflated Variance with Multicollinearity
usd: Inflated Variance with Multicollinearity
sp500: Inflated Variance with Multicollinearity
demand: 8.365264602298337
In [69]:
nsList
Out[69]:
['diffy',
 'mwnus',
 'mwtrv',
 'avbls',
 'blchs',
 'hrate',
 'cptrv',
 'toutv',
 'ntrbl',
 'naddu',
 'ntrep',
 'ntrat',
 'ntran',
 'trfee',
 'totbc',
 'mirev',
 'cptra',
 'etrvu',
 'trfus',
 'mktcp',
 'oil',
 'bond',
 'usd',
 'sp500']
In [70]:
# Drop all insignificant variables from feature set
zScoreXsigVIF = zScoreXsig.drop(columns=nsList)
zScoreXsigVIF.tail(3)
Out[70]:
const mwntd atrct etrav trvou gld demand
2910 1.0 1.063467 1.349198 -0.386978 1.455317 -0.904448 0.452747
2911 1.0 1.061330 1.349198 -0.338140 0.893827 -0.904448 0.331986
2912 1.0 0.688275 1.349198 -0.577482 0.743319 -0.904448 0.278315

Multiple Linear Regression Model Implementation: 2nd Iteration and Test

In [71]:
## fit a OLS model with intercept on reduced feature set
zScoreXsigVIF = sm.add_constant(zScoreXsigVIF)

est2 = sm.OLS(zScoreY, zScoreXsigVIF, missing="drop").fit()

est2.summary(alpha=.05)
Out[71]:
OLS Regression Results
Dep. Variable: price R-squared: 0.772
Model: OLS Adj. R-squared: 0.772
Method: Least Squares F-statistic: 1642.
Date: Wed, 22 Aug 2018 Prob (F-statistic): 0.00
Time: 21:56:20 Log-Likelihood: -1978.9
No. Observations: 2913 AIC: 3972.
Df Residuals: 2906 BIC: 4014.
Df Model: 6
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const 0.0002 0.009 0.026 0.979 -0.017 0.018
mwntd 0.0744 0.014 5.396 0.000 0.047 0.101
atrct 0.0480 0.010 4.758 0.000 0.028 0.068
etrav -0.0566 0.009 -6.268 0.000 -0.074 -0.039
trvou 0.5045 0.019 27.137 0.000 0.468 0.541
gld -0.0027 0.013 -0.213 0.832 -0.027 0.022
demand 0.3612 0.019 19.085 0.000 0.324 0.398
Omnibus: 880.060 Durbin-Watson: 0.356
Prob(Omnibus): 0.000 Jarque-Bera (JB): 93804.910
Skew: -0.355 Prob(JB): 0.00
Kurtosis: 30.791 Cond. No. 4.50


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Variance Inflation Factors

Apply Variance Inflation Factor calculation to feature set to find any possible multicollinearity in the data set.

In [72]:
vif2 = pd.DataFrame()
vif2["VIF Factor"] = [variance_inflation_factor(zScoreXsigVIF.values, i) for i in range(zScoreXsigVIF.shape[1])]
vif2["features"] = zScoreXsigVIF.columns
vif2 = vif2.round(2)
vif2
Out[72]:
VIF Factor features
0 1.00 const
1 2.42 mwntd
2 1.30 atrct
3 1.04 etrav
4 4.41 trvou
5 1.99 gld
6 4.57 demand
In [73]:
pvalsModel = pd.DataFrame(est2.pvalues)
pvalsModel
Out[73]:
0
const 9.792701e-01
mwntd 7.379494e-08
atrct 2.049175e-06
etrav 4.192842e-10
trvou 9.685722e-145
gld 8.315250e-01
demand 1.350323e-76

3rd Test of Feature Significance to Solve for Regression Over-Fit using VIF < 5 and pVal < 0.05

In [74]:
# Capture predictors with pval < 0.05 && VIF < 5

nsList = []
print("Significant features,")
print("with low Variation.")
print("feature  VIF    Pval")
print("====================")
for i in vif2.features.index:   
    if(vif2['VIF Factor'][i] <= 5 and pvalsModel[0][i] < 0.05):
        print(str(vif2.index[i]), vif2.features[i] + ":   " + str(vif2['VIF Factor'][i]) + "   " + str(pvalsModel[0][i]))
        
    else:
        nsList.append(vif2.features[i])
        print(str(vif2.index[i]), vif2.features[i] + ":   " + "Variable NOT SIGNIFICANT or VARIATION TOO HIGH.")
Significant features,
with low Variation.
feature  VIF    Pval
====================
0 const:   Variable NOT SIGNIFICANT or VARIATION TOO HIGH.
1 mwntd:   2.42   7.379494115714782e-08
2 atrct:   1.3   2.0491753457349815e-06
3 etrav:   1.04   4.192842375833807e-10
4 trvou:   4.41   9.68572236145097e-145
5 gld:   Variable NOT SIGNIFICANT or VARIATION TOO HIGH.
6 demand:   4.57   1.350323404428134e-76
In [75]:
nsList
Out[75]:
['const', 'gld']
In [76]:
# Drop all insignificant variables with high variation from feature set
zScoreXsigVIF2 = zScoreXsigVIF.drop(columns=nsList)
In [77]:
# Create Categorical column using Up/Down price return
PnL = pd.DataFrame(price['ret'])
PnLarr = []

# Set NaN val to 0
PnL.ret[0] = 0

# Iterate over each return value and categorize
for row in PnL.ret:
    if row <= 0:
        PnLarr.append(0)
    else:
        PnLarr.append(1)
        
PnL['PnL'] = PnLarr

PnL = (PnL.PnL).to_frame()

PnL = PnL.reset_index(drop=True)
In [78]:
# Concat PnL to standardized feature set
stdFeatures = [zScoreXsigVIF2, PnL]

stdFeatures = pd.concat(stdFeatures, axis=1)

stdFeatures.tail(5)
Out[78]:
mwntd atrct etrav trvou demand PnL
2908 1.107156 1.349198 -0.135701 1.616889 0.627179 0
2909 1.250469 1.349198 -0.274898 1.729935 0.466165 1
2910 1.063467 1.349198 -0.386978 1.455317 0.452747 0
2911 1.061330 1.349198 -0.338140 0.893827 0.331986 1
2912 0.688275 1.349198 -0.577482 0.743319 0.278315 0

Multiple Linear Regression Model: Initial Results

Below are the statistically significant features in the set of all features determined by the regression model to describe the model and the price of Bitcoin.

In [79]:
stdFeatures.describe()
Out[79]:
mwntd atrct etrav trvou demand PnL
count 2913.000000 2913.000000 2913.000000 2913.000000 2913.000000 2913.000000
mean -0.000236 -0.000463 0.000198 -0.000255 -0.000137 0.516650
std 1.000262 1.000031 1.000286 1.000248 1.000316 0.499809
min -1.041758 -1.772186 -0.772862 -0.331152 -0.459665 0.000000
25% -1.034227 -0.299699 -0.381653 -0.329650 -0.413672 0.000000
50% -0.166955 -0.000882 -0.116392 -0.317420 -0.285619 1.000000
75% 0.818834 0.546351 0.182428 -0.238990 -0.096128 1.000000
max 3.123509 8.538820 21.225707 15.371887 13.535677 1.000000

Statistical Results & Visualizations

Visualize Model in a Six Dimensional (6D) Plot

This model can be visualized in 3-d space to display 6 Dimensions:

In [80]:
# create matplotlib canvas with 3d axes
fig = plt.figure(figsize=(18, 13))
title = fig.suptitle('6D Plot: Price - Num Txn/Day - Txn Confirmation Time - Txn Vol in USD - Market Demand - Daily Performance', fontsize=18)
#ax = fig.add_subplot(111, projection='3d')
ax = Axes3D(fig, azim=-40, elev=25)


xs = list(stdFeatures['trvou'])
ys = list(stdFeatures['demand'])
zs = list(zScoreY['price'])


data_points = [(x, y, z) for x, y, z in zip(xs, ys, zs)]

# Size of observation plot based on num users
ss = list(stdFeatures['mwntd'])
# Color determined by daily PnL
colors = ['red' if pl == 0 else 'green' for pl in list(stdFeatures['PnL'])]
# Shape determined by txn confirmation time
# Squares = Slower
# Circles = Faster
markers = ['o' if ct > 0.6 else ',' for ct in list(stdFeatures['atrct'])]

for data, color, size, mark in zip(data_points, colors, ss, markers):
    x, y, z = data
    graph = ax.scatter(x, y, z, alpha=0.4, c=color, edgecolors='none', s=(size*1000), marker=mark)

ax.set_xlabel('Txn Volume')
ax.set_ylabel('Demand')
ax.set_zlabel('Price')
c:\program files\python36\lib\site-packages\matplotlib\collections.py:902: RuntimeWarning: invalid value encountered in sqrt
  scale = np.sqrt(self._sizes) * dpi / 72.0 * self._factor
Out[80]:
Text(0.5,0,'Price')

6D Plot Legend

  1. Transaction Volume (x-axis):
    • Linear Regression Fitted Values Increasing along x-axis.
  2. Market Demand (y-axis):
    • Linear Regression Fitted Values Increasing aling y-axis.
  3. Price (z-axis):
    • Linear Regression Fitted Values Increasing along z-axis.
  4. Bitcoin Users Number of Txns/Day (size-dim):
    • Larger elements represent more users executing transactions daily.
    • Smaller elements represent less users exectuting transactions daily.
  5. Profit and Loss (color-dim):
    • Green elements represent market "Up days".
    • Red elements represent market "Down days".
  6. Avg. Txn Confirmation Time (shape-dim):
    • Circle elements represent slower transaction times.
    • Square elements represent faster transaction times.

4D Visualization: Plotting the model as a Hyperplane

This model can be visualized as a 2-d plane in 3-d space:

In [81]:
## Create the 3d plot
# cptrv/gld grid for 3d plot
xx1, xx2= np.meshgrid(np.linspace(stdFeatures.trvou.min(), stdFeatures.trvou.max()),
                      np.linspace(stdFeatures.demand.min(), stdFeatures.demand.max())
                     )

est2.params
Out[81]:
const     0.000230
mwntd     0.074364
atrct     0.048038
etrav    -0.056560
trvou     0.504542
gld      -0.002661
demand    0.361168
dtype: float64
In [82]:
# plot the hyperplane by evaluating the parameters on the grid
Z = est2.params[0] + est2.params[4] * xx1 + est2.params[6] * xx2
In [83]:
# create matplotlib canvas with 3d axes
fig = plt.figure(figsize=(18, 13))
title = fig.suptitle('4D Plot: Price - Txn Confirmation Time - Txn Vol in USD - Market Demand', fontsize=18)
ax = Axes3D(fig, azim=-250, elev=15)

# plot hyperplane
surf = ax.plot_surface(xx1, xx2, Z, cmap=plt.cm.RdBu_r, alpha=0.65, linewidth=3)

#Capture data points for scatter plot
test = pd.DataFrame(est2.predict(zScoreXsigVIF), columns=['price'])

ss = list((zScoreXsigVIF['atrct'].abs()*100))

# plot data points - points over the HP are white, points below are black
resid = zScoreY.sub(test)

# Plot DataFrame scatter plot
ax.scatter(zScoreXsigVIF[resid.price >= 0].trvou, zScoreXsigVIF[resid.price >= 0].demand, zScoreY[resid.price >= 0], color='black', alpha=1.0, facecolor='white', s=ss)
ax.scatter(zScoreXsigVIF[resid.price < 0].trvou, zScoreXsigVIF[resid.price < 0].demand, zScoreY[resid.price < 0], color='black', alpha=1.0, s=ss)

# set axis labels
ax.set_xlabel('Txn Volume')
ax.set_ylabel('Demand')
ax.set_zlabel('Price')
Out[83]:
Text(0.5,0,'Price')

5D Plot Legend

  1. Transaction Volume (x-axis):
    • Linear Regression Fitted Values Increasing along x-axis.
  2. Market Demand (y-axis):
    • Linear Regression Fitted Values Increasing aling y-axis.
  3. Price (z-axis):
    • Linear Regression Fitted Values Increasing along z-axis.
  4. Avg. Txn Confirmation Time (size-dim):
    • Bigger elements represent slower transaction times.
    • Smaller elements represent faster transaction times.
  5. Linear Regression Fitted Values (color-dim):
    • White elements represent a positive Residual when compared to the Mean.
    • Black elements represent a negative Residual when compared to the Mean.
In [84]:
hypotheses1 = '(const = mwntd)'
f_test = est2.f_test(hypotheses1)
print(f_test)
<F test: F=array([[20.47744089]]), p=6.276019158998907e-06, df_denom=2906, df_num=1>
In [85]:
hypotheses2 = '(const = atrct)'
f_test = est2.f_test(hypotheses2)
print(f_test)
<F test: F=array([[12.6811182]]), p=0.0003752938721939095, df_denom=2906, df_num=1>
In [86]:
hypotheses3 = '(const = trvou)'
f_test = est2.f_test(hypotheses3)
print(f_test)
<F test: F=array([[599.86786788]]), p=1.3598787855998208e-120, df_denom=2906, df_num=1>
In [87]:
hypotheses4 = '(const = demand)'
f_test = est2.f_test(hypotheses4)
print(f_test)
<F test: F=array([[298.3879435]]), p=1.0120482358859013e-63, df_denom=2906, df_num=1>

MWNTD

In [88]:
pltFig = plt.figure(figsize=(10 * 1.618, 10))

pltFig = smplt.plot_regress_exog(results=est2, exog_idx=1, fig=pltFig)

ATRCT

In [89]:
pltFig = plt.figure(figsize=(10 * 1.618, 10))
pltFig = smplt.plot_regress_exog(results=est2, exog_idx=2, fig=pltFig)

TRVOU

In [90]:
pltFig = plt.figure(figsize=(10 * 1.618, 10))
pltFig = smplt.plot_regress_exog(results=est2, exog_idx=4, fig=pltFig)

DEMAND

In [91]:
pltFig = plt.figure(figsize=(10 * 1.618, 10))
pltFig = smplt.plot_regress_exog(results=est2, exog_idx=6, fig=pltFig)

Velocity of Money & Inflationary Pressures

The ratio of total spending to the supply of money is commonly called the velocity of money and can be viewed from two different historical perspectives using opposing monetary frameworks. Arguments made in modern times to debate the nature of monetary policy and its effect on market and asset prices rely on either Quantity Theory, which is argued for extensively by economists such as Milton Freidman, or Keynesian Theory which is the monetary framework described by John Maynard Keynes in the 1930's to analyze aggregate economic relationships. The historical argument for the relationship between aggregate spending and the supply of money is the case for which Bitcoin proposes a solution to, based on the principles of Quantity Theory, to cap the supply of money in the economy by imposing a constraint on the inflationary pressures that affect prices. By hardcoding a supply cap on the total supply of money in the Bitcoin market, in this case the total maximum supply of 21 million Bitcoin, Bitcoin's promise is to control inflationary pressures caused by a Central Authority excessively printing money to chase its monetary Velocity, to stem the price inflation of the asset.

There is a great deal of academic and economic study pertaining to the liquidity preferences of the market participants and why people make decisions about whether to use Money as a Medium of Exchange or as a Store of Value. Indeed, Keynes' work focused heavily on these different variables affecting the larger set of features which he believed provided a better explanation for the supply of money and its impact on inflation and ultimately, monetary policy. His work led to the quantification of many of the models studied and analyzed by the Federal Reserve today. Due to differences in liquidity preferences, however, and differences in views towards monetary policy, it has proven difficult to test and define a model that could serve as a platform to help quantify the differences between these two monetary theories.

"The income velocity of money is defined as the ratio of nominal income (that is, the dollar value of income at current prices) to the money stock. If Y represents the real quantity of goods, and services produced and P, the average price paid for these goods and services, then PY is the value of nominal income and V [=PY/M] is the income velocity of money, where M represents the money stock. Income velocity measures the average number of times in a given period each dollar is spent for currently produced goods and services."$^{34}$

"An alternative measure of velocity was often used by quantity theorists. If T, the total number of purchases financed by monetary exchange, rather than Y is used as the measure of transactions, the transactions velocity of money can be defined as V' = P'T/M, where P' is the average price of all transactions. The focus in this article is on the effect of the money supply on income and economic activity. For this purpose, the income velocity of money, V, is more useful than the more inclusive transactions velocity of money, V'."$^{34}$

Taking the information presented, we continue further and plot the data below for further analysis.

In [92]:
# Derive txnVelocity & incomeVelocity & Aggregate Demand for each
txnVelocity = (ntran.ntran.mul(price.price)).div(totbc.totbc).rename(columns={"Value": "txnVelocity"})
incomeVelocity = mirev.mirev.div(totbc.totbc).rename(columns={"Value": "incomeVelocity"})
moneySupply = totbc.totbc.rename(columns={"Value": "Money Stock"})

priceplt = price.price.plot(label="Price", figsize=(18,10), logy=True, legend=True)

# Plot total money supply in circulation
moneySupply = moneySupply.plot(ax=priceplt, title='Money Supply', label='Money Stock', figsize=(18,10), logy=True, legend=True)

# Plot the transaction Velocity & Aggregate Demand
incV = incomeVelocity.plot(ax=moneySupply, label='incomeVelocity', figsize=(18,10), logy=True, legend=True)
txnVelocity.plot(ax=incV, title='Transaction Velocity vs. Income Velocity', label='txnVelocity', figsize=(18,10), logy=True, legend=True)
Out[92]:
<matplotlib.axes._subplots.AxesSubplot at 0x14e92e44eb8>
In [93]:
incomeVelocity[2912]
Out[93]:
0.7538330722566181
In [94]:
txnVelocity[2912]
Out[94]:
71.37934534678435

The data seems to suggest that as the money supply of Bitcoin expands, the price of Bitcoin increases as argued from the perspective of Quantity Theory. Depending on the liquidity preferences of the market participants, conversely, the data suggests that as the yield expectation has fallen since the start of 2018 because of the plateauing of the supply of money as shown in the plot, the Velocity of Money has also declined. It seems that in practice, the price of Bitcoin can be described with a hybrid model that considers both the Quantity and Keynesian theories.

"Inflation can be a purely velocity inflation as easily as a quantity inflation... If for any psychological reason the people's liquidity preference should fall by half and velocity double, equilibrium prices must surely double even though money supply remains unchanged. The reverse is true if people's preference shifts drastically toward holding money and not spending it. Prices must fall. A purely velocity inflation is usually quite volatile. If velocity rises sharply for some psychological and spontaneous reason, unaccompanied by money quantity inflation, velocity will usually return to its norm about as quickly as it departed from it. Sharp velocity inflations left to themselves are almost never permanent. This truth is precisely the opposite of quantity inflation, for a price rise based on money quantity is as irrecoverable as money quantity itself."$^{21}$

The velocity description seems to agree with Quantity Theory in that as Bitcoin were introduced into the market by the miners who provide the network infrastructure, which lead to a measured increase in the money supply as programmed by Satoshi Nakamoto, an increase in the price level of the same proportion is observed in the plot shown. As the supply of Bitcoin has approached the 21 million Bitcoin limit hardcoded into the network, the price has declined to reflect the Keynesian argument that although the market has observed increased expenditures by its participants, it has not witnessed an increase in the price level. As Keynes described, that although the change in money supply is known, the price level does in fact depend on many things that may not be accounted for.$^{34}$

Learning & Modeling

In [95]:
tf.logging.set_verbosity(tf.logging.ERROR)
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.1f}'.format

# Clean data and remove NaN values
tfData = btcData.fillna(method='bfill')
tfData = tfData.fillna(method='pad')

# Load Data Set
# Randomize Data
tfData = tfData.reindex(
    np.random.permutation(tfData.index))

# Examine the Data
tfData.describe()
Out[95]:
price ret diffy mwntd mwnus mwtrv avbls blchs atrct hrate ... trvou etrvu trfus mktcp gld crude bond usd sp500 bitcoin
count 2914.0 2914.0 2914.0 2914.0 2914.0 2914.0 2914.0 2914.0 2914.0 2914.0 ... 2914.0 2914.0 2914.0 2914.0 2914.0 2914.0 2914.0 2914.0 2914.0 2914.0
mean 1384.8 0.6 454271768200.9 30712.0 5607152.9 114619.9 0.4 46244.0 8.2 3464075.1 ... 112865626.1 269907957.5 290198.7 22729241085.1 1367.3 74.8 145.5 87.4 1873.0 15436.3
std 2929.3 7.0 1129730189494.0 29486.0 7554264.4 271162.1 0.4 53828.1 4.6 8654449.2 ... 340885219.5 619544737.0 1375146.9 49544569017.3 188.1 23.3 12.3 8.3 472.4 33587.4
min 0.1 -64.6 623.4 0.0 2.0 0.0 0.0 1.0 0.0 0.0 ... 51.3 556.0 0.0 239284.5 1051.7 28.3 117.3 73.1 1045.0 0.0
25% 12.2 -1.4 2440642.6 224.8 19715.5 0.0 0.1 2290.0 6.8 18.3 ... 512898.2 2383051.0 267.4 114628620.8 1238.2 50.9 136.8 80.1 1403.2 1545.2
50% 293.1 0.1 23844670038.8 25806.0 2078320.0 61360.7 0.3 20891.0 8.2 192615.3 ... 4681991.1 50594615.5 5703.9 4148214508.0 1311.1 81.6 145.3 83.9 1930.4 5862.1
75% 703.8 1.9 217375482757.0 54808.2 8385980.5 158798.2 0.8 79874.4 10.7 1622629.2 ... 31466779.9 183488992.5 40049.0 10257932618.9 1482.1 96.4 153.4 95.3 2145.5 12208.2
max 19498.7 173.0 6389316883510.0 122796.0 27811254.0 8484599.7 1.2 179361.2 47.7 53994493.1 ... 5352015515.5 5760245259.9 22724840.7 326525438567.0 1893.6 114.4 177.1 103.3 2874.8 469986.1

8 rows × 32 columns

In [96]:
# Build the model
# Define the input feature: tfData. pull data from
# btcData & define feature column as numeric
tf_feature = tfData[["diffy", "mwnus", "mwtrv", "avbls", "blchs",
          "atrct", "hrate", "cptrv", "etrav", "toutv", "ntrbl", "naddu",
          "ntrep", "ntrat", "ntran", "trfee", "totbc", "mirev", "cptra",
          "trvou", "etrvu", "trfus", "mktcp", "gld", "crude", "bond", "usd", "sp500", "bitcoin"]]

# Configure a numeric feature column for tfData cols.
tf_feature_columns = [tf.feature_column.numeric_column("diffy"),tf.feature_column.numeric_column("mwnus"),
           tf.feature_column.numeric_column("mwtrv"),tf.feature_column.numeric_column("avbls"),
           tf.feature_column.numeric_column("blchs"),tf.feature_column.numeric_column("atrct"),
           tf.feature_column.numeric_column("hrate"),tf.feature_column.numeric_column("cptrv"),
           tf.feature_column.numeric_column("etrav"),tf.feature_column.numeric_column("toutv"),
           tf.feature_column.numeric_column("ntrbl"),tf.feature_column.numeric_column("naddu"),
           tf.feature_column.numeric_column("ntrep"),tf.feature_column.numeric_column("ntrat"),
           tf.feature_column.numeric_column("ntran"),tf.feature_column.numeric_column("trfee"),
           tf.feature_column.numeric_column("totbc"),tf.feature_column.numeric_column("mirev"),
           tf.feature_column.numeric_column("cptra"),tf.feature_column.numeric_column("trvou"),
           tf.feature_column.numeric_column("etrvu"),tf.feature_column.numeric_column("trfus"),
           tf.feature_column.numeric_column("mktcp"),tf.feature_column.numeric_column("gld"),
           tf.feature_column.numeric_column("crude"),tf.feature_column.numeric_column("bond"),
           tf.feature_column.numeric_column("usd"),tf.feature_column.numeric_column("sp500"),
           tf.feature_column.numeric_column("bitcoin")]

# Define the label, the target to train for future predictions.
targets = tfData["price"]
In [97]:
# Configure the Linear Regressor
# Train model using GradientDescentOptimizer & implement
# Mini-Batch Stochastic Gradient Descent. The LearningRate 
# argument controls the size of the gradient step.
# Gradient Clipping also applied to optimizer with 
# clip_gradients_by_norm to ensure the magnitude of the
# gradients do not become too large during training, which
# can cause gradient descent to fail

# Use gradient descent as the optimizer for training the model.
tf_optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.000000001)
tf_optimizer = tf.contrib.estimator.clip_gradients_by_norm(tf_optimizer, 2.0)

# Configure the linear regression model with our feature columns and optimizer.
# Set a learning rate of 0.0000001 for Gradient Descent.
linear_regressor = tf.estimator.LinearRegressor(
    feature_columns = tf_feature_columns,
    optimizer = tf_optimizer
)
In [98]:
# Define the input function
# Need to instruct TF how to preprocess data, batch, shuffle
# and repeat for model training

# First convert pandas feature df into a dict of numpy arrays
# Use TF Dataset API to construct dataset object from data,
# to then break data into batches of batch_size, to be repeated
# for a specific number of epochs (num_epochs).
# When default value of num_epochs = None is passed into
# repeat(), the input data will repeat indefinitely
# IF shuffle is set to True, shuffle data and pass to 
# model randomly during training. The buffer_size argument
# specifies size of dataset that shuffle will randomly sample.
# Input function must construct an iterator for dataset and returns 
# next batch of data to linear regressor.

def my_input_fn(features, targets, batch_size=1, shuffle=True, num_epochs=None):
    """Trains a linear regression model.
  
    Args:
      features: pandas DataFrame of features
      targets: pandas DataFrame of targets
      batch_size: Size of batches to be passed to the model
      shuffle: True or False. Whether to shuffle the data.
      num_epochs: Number of epochs for which data should be repeated. None = repeat indefinitely
    Returns:
      Tuple of (features, labels) for next data batch
    """
  
    # Convert pandas data into a dict of np arrays.
    features = {key:np.array(value) for key,value in dict(features).items()}                                           
 
    # Construct a dataset, and configure batching/repeating.
    ds = Dataset.from_tensor_slices((features,targets)) # warning: 2GB limit
    ds = ds.batch(batch_size).repeat(num_epochs)
    
    # Shuffle the data, if specified.
    if shuffle:
      ds = ds.shuffle(buffer_size=10000)
    
    # Return the next batch of data.
    features, labels = ds.make_one_shot_iterator().get_next()
    
    return features, labels
In [99]:
# Train the model
# can now call train() on linear_regressor to train the model. 
# wrap my_input_fn in a lambda to pass in my_feature and target as 
# arguments, and to start, train model for 100 steps.

_ = linear_regressor.train(
    input_fn = lambda:my_input_fn(tf_feature, targets),
    steps=100
)

Prediction

In [100]:
# Make predictions on training data, see how well model fit during training.

# NOTE: Training error measures how well model fits the training data, but it does not 
# measure how well model generalizes to new data. Not testing for generalization

# Create input function for predictions.
prediction_input_fn =lambda: my_input_fn(tf_feature, targets, num_epochs=1, shuffle=False)

# Call predict() on the linear_regressor to make predictions.
predictions = linear_regressor.predict(input_fn=prediction_input_fn)

# Format predictions as a NumPy array, to calculate error metrics.
predictions = np.array([item['predictions'][0] for item in predictions])
In [102]:
# Compare predictions to targets
calibration_data = pd.DataFrame()
calibration_data["predictions"] = pd.Series(predictions)
calibration_data["targets"] = pd.Series(targets.values)
calibration_data.describe()
Out[102]:
predictions targets
count 2914.0 2914.0
mean 1624.0 1384.8
std 3497.9 2929.3
min 0.1 0.1
25% 6.7 12.2
50% 273.1 293.1
75% 732.7 703.8
max 19708.2 19498.7

Evaluation

In [103]:
# Evaluate Model
# Print Mean Squared Error and Root Mean Squared Error.
mean_squared_error = metrics.mean_squared_error(predictions, targets)
root_mean_squared_error = math.sqrt(mean_squared_error)
print ("Mean Squared Error (on training data): %0.3f" % mean_squared_error)
print ("Root Mean Squared Error (on training data): %0.3f" % root_mean_squared_error)
Mean Squared Error (on training data): 643486.670
Root Mean Squared Error (on training data): 802.176
In [104]:
# Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) instead. 
# Compare the RMSE to the difference of the min and max of our targets:

min_btc_value = tfData["price"].min()
max_btc_value = tfData["price"].max()
min_max_difference = max_btc_value - min_btc_value

print ("Min. Median Bitcoin Value: %0.3f" % min_btc_value)
print ("Max. Median Bitcoin Value: %0.3f" % max_btc_value)
print ("Difference between Min. and Max.: %0.3f" % min_max_difference)
print ("Root Mean Squared Error: %0.3f" % root_mean_squared_error)
Min. Median Bitcoin Value: 0.061
Max. Median Bitcoin Value: 19498.683
Difference between Min. and Max.: 19498.622
Root Mean Squared Error: 802.176

Combined Results

A total of 2,912 observations were available for the implementation of the machine learning models and allowed for the testing of 30 features against a response variable with daily data whose time-series study began on 4 January 2009. The distribution of price according to the daily performance of the asset suggested that a participant in the market for Bitcoin can expect to see more frequent “down days,” or days with a closing price that is lower than the previous day’s closing price, than “up days.” A distribution plot for each of the features also proved that the data in the feature set was not “normally distributed” and the irregularity of the data set showed an extreme amount of skew in each of the distribution plots. After standardizing the regression inputs into z-scores to eliminate the scaling of units for each of the variables, the authors implemented a Spearman’s Rank correlation matrix to account for the skew of the feature data set. The first round of statistical testing attempted to eliminate features with a high degree of multicollinearity who displayed an insignificant p-value for its Spearman Rank correlation score; this test resulted in suggesting that all the features chosen were significant when measured against its Spearman Rank. The first implementation of an Ordinary Least Squared regression model resulted in an R-squared value of 1.000 leading to the conclusion that the data “over-fit” the model. After detecting for multicollinearity by eliminating Variance Inflation Factors with a measure greater than 10 whose p-values were not less than 0.5, the study presents a second iteration of testing to implement the final ordinary least squared regression whose R-squared result proved to explain 77.2% of the variation of the response variable, price, as measured by the model. The final corrections for multicollinearity took place by eliminating Variance Inflation Factors greater than 5 and selecting statistically-significant features whose P-Values were less than 0.05. The model suggested that the features associated with market demand, number of transactions, transaction confirmation times, transaction USD volume, and market demand measured using Google Search Trends data, account for the prices observed in the Bitcoin markets today. Faster confirmation times, demand as measured by daily Google searches, and transaction volume, explain 77.2% of the price of Bitcoin described by the R-Squared result. After measuring the properties described as the “Velocity of Money,” the data appears to suggest that as the money supply of Bitcoin expanded, the price of Bitcoin increased. Depending on the liquidity preferences of the market participants, conversely, the data suggests that as the yield expectation has fallen since the start of 2018 because of the plateauing of the supply of money as less Bitcoin flattens out over time, the Velocity of Money has also declined. As the rate of Bitcoin created decreases, the “Velocity of Money” data presented suggests that the price of Bitcoin has begun to stabilize, and its hardcoded mechanism that implements a constraint on the ceiling of Bitcoin in circulation, may validate the effect of the anti-inflationary stance taken by its creators. A final multiple linear regression model implemented using TensorFlow resulted in a set of predicted values for the response variables whose Root Mean Squared Error (RMSE) was 802.176, indicating that the loss accounts for 4.1% of the error when compared to the observed target values.

Conclusions

The demand for Bitcoin as a Store of Value and as a Medium of Exchange provide credible evidence that the solutions offered by Bitcoin do provide value to the people of the world today. Enabling private transactions on a private network with easy access to the global economy enables a frictionless digital environment that gives the people of the world a way to conduct trade privately and securely. The data shows that the price of Bitcoin is dependent on the demand and due to its limited supply, its engineered anti-inflationary properties suggest that it is entering a more stable valuation. The measured values of velocity that pertain to Bitcoin show that as its creation and circulation have plateaued, its velocity has reverted towards its mean as reflected in its current pricing. Bitcoin and other digital currencies solve poverty as shown in case studies in Kenya, Bitcoin provides easy access to global markets from a cheap device, Bitcoin allows people to conduct trade in an absolutely free-market, Bitcoin is free from the manipulation of a Central Authority and is inherently anti-inflationary, Bitcoin is engineered with properties that was never thought possible for a currency, Bitcoin is disrupting the existing financial system because it adds value due to a lack of regulatory constraints. Bitcoin is in demand because it serves a market that existing commercial interests cannot serve because of Anti-Money Laundering regulations, Know-Your-Customer requirements, and due to an existing industry specification, that does not scale well to meet the needs of the entire world. In addition to its engineered monetary properties, Bitcoin has a distributed architecture fosters the trust of its users which cannot be controlled, seized, or shut down by a Central Authority ensuring that its value cannot be arbitrarily manipulated. The features in this study that describe demand, explain 77.2% of the price of Bitcoin shown by the R-Squared result. If Bitcoin continues to provide value to society by facilitating the conduct of global economic trade securely, with transparency, and at a reduced cost, the model suggests that its value will continue to reflect its demand as shown by the Google Search Trends data presented. Studies of its Velocity highlight aspects of Quantity Theory that seem to validate themselves in what this paper considers a global experiment in Bitcoin. In constraining the supply of Bitcoin in circulation, and after measuring its velocity, it is interesting to note that upon the leveling-off of the monetary supply of Bitcoin, the velocity will contract and stabilize the price of the currency. By regulating the circulation of additional units of currency, Bitcoin’s monetary policy and technical constraints applied Quantity Theory to engineer its very stability.

Acknowledgements

The original research summarized in the following was supported, in part, by U.S. Department of Education grant awards P031C160161 (STEM SPACE), P031C160143 (STEM EngInE), P120A160036 (STEM ISLE), U.S. Department of Agriculture grant award 2016-38422-25549 (iCATCH grant), and National Science Foundation Subaward UFDSP00011889 (Florida Pathways to Success). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the respective funding agency.

References & Sources

$^{1}$ Bitcoin Wiki. "This wiki is maintained by the Bitcoin community". Bitcoin Foundation. en.bitcoin.it/wiki/Main_Page. Accessed 3 May 2018.

$^{2}$ Dai, Wei. "A scheme for a group of untraceable digital pseudonyms to pay each other with money and to enforce contracts amongst themselves without outside help". bmoney.txt, www.weidai.com/bmoney.txt. Accessed 10 May 2018.

$^{3}$ PKI: Implementing and Managing E-Security. Pg 23. RSA Press. 2001

$^{4}$ PKI: Implementing and Managing E-Security. Pg 30, Table 2-1. RSA Press. 2001

$^{5}$ Nakamoto, Satoshi. "Bitcoin A Peer-to--Peer Electronic Cash System". Bitcoin Foundation, bitcoin.org. Accessed 20 May 2018.

$^{6}$ Lopp, Jameson. "Bitcoin and the Rise of the Cypherpunks". CoinDesk, www.coindesk.com/the-rise-of-the-cypherpunks/. Accessed 5 June 2018.

$^{7}$ Hughes, Eric. "A Cypherpunk's Manifesto". Activism.net, www.activism.net/cypherpunk/manifesto.html. Accessed 24 May 2018.

$^{8}$ Levy, Steven. "CryptoRebels". Wired, www.wired.com/1993/02/crypto-rebels/. Accessed 15 June 2018.

$^{9}$ Back, Adam. "A partial hash collision based postage scheme". HashCash.org, www.hashcash.org/papers/announce.txt. Accessed 12 July 2018.

$^{10}$ Rao, Justin M. and David H. Reiley. "The Economics of Spam". American Economic Association: Journal of Economic Perspectives, vol. 26, no. 3, pg. 87-110 Summer 2012, www.aeaweb.org/articles?id=10.1257/jep.26.3.87. Accessed 30 June 2018.

$^{11}$ Weinswig, Deborah. "Deep Dive: An Introduction to Cybersecurity: Components of an Advanced Attack, Characteristics of an Advanced Persistent Threat, and Types of Attacks and Hackers". Fung Global Retail & Technology, figure 7, pg. 11, www.fungglobalretailtech.com. Accessed 20 July 2018.

$^{12}$ Crowe, Jonathan. "Ransomware by the Numbers: Must-Know Ransomware Statistics 2016". Barkly, blog.barkly.com/ransomware-statistics-2016. Accessed 16 July 2018.

$^{13}$ Back, Adam. "Bitcoin". HashCash.org, www.hashcash.org/bitcoin/. Accessed 21 July 2018.

$^{14}$ Back, Adam. "export-a-crypto-system-sig". CypherSpace.org, www.cypherspace.org/adam/rsa/. Accessed 19 July 2018.

$^{15}$ Demirguc-Kunt, Asli, Klapper, Leora, Singer, Dorothe, Ansar, Saniya, and Jake Hess. "Measuring Financial Inclusion and the Fintech Revolution". The Global Findex Database, 2017, pp. 35-41. Wiley Online Library, globalfindex.worldbank.org/..

$^{16}$ Alexandre, Claire. "10 Things You Thought You Knew about M-PESA". CGAP, www.cgap.org/blog/10-things-you-thought-you-knew-about-m-pesa. Accessed 30 July 2018.

$^{17}$ Dawson, Stella. "Why Does M-PESA Lift Kenyans Out of Poverty?". CGAP, www.cgap.org/blog/why-does-m-pesa-lift-kenyans-out-poverty. Accessed 30 July 2018.

$^{18}$ "Blockchain Charts & Statistics API". BLOCKCHAIN LUXEMBOURG S.A., Charts & Statistics API. Quandl, www.quandl.com/data/BCHAIN-Blockchain.

$^{19}$ "Credit Card Processing Fees and Rates Explained". SquareUp, squareup.com/guides/credit-card-processing-fees-and-rates. Accessed 30 July 2018.

$^{20}$ Jack, William and Tavneet Suri. "The Mobile Money Revolution". Georgetown University Initiative on Innovation, Development and Evaluation, www.theigc.org/wp-content/uploads/2016/03/1.-Sarah-Logan.pdf. Accessed 30 July 2018.

$^{21}$ Parsson, Jens O. Dying of Money: Lessons of the Great German and American Inflations. Wellspring Press, 1974.

$^{22}$ Antonopoulos, Andreas M. “The Killer App: Engineering the Properties of Money”. YouTube, uploaded by aantonop, 10 June 2017, www.youtube.com/watch?v=MxIrc1rxhyI.

$^{23}$ "2030 Agenda for Sustainable Development Goals". United Nations Development Programme, www.undp.org/content/undp/en/home/sustainable-development-goals/goal-1-no-poverty/targets.html. Accessed 5 August 2018.

$^{24}$ Lo, Stephanie, and J. Christina Wang. "Bitcoin as Money?". Current Policy Perspectives: Federal Reserve Bank of Boston, No. 14-4, www.bostonfed.org/-/media/Documents/Workingpapers/PDF/cpp1404.pdf. Accessed 10 August 2018.

$^{25}$ Bitcoin Wiki. "Confirmation". Bitcoin Foundation, en.bitcoin.it/wiki/Confirmation. Accessed 3 May 2018.

$^{26}$ "Chargeback 101: Credit Card Chargebacks Explained". SquareUp, squareup.com/townsquare/what-is-a-chargeback-what-makes-it-happen. Accessed 30 July 2018.

$^{27}$ Trillo, Manny. "Stress Test Prepares VisaNet for the Most Wonderful Time of the Year". Visa Viewpoints, www.visa.com/blogarchives/us/2013/10/10/stress-test-prepares-visanet-for-the-most-wonderful-time-of-the-year/index.html. Accessed 2 August 2018.

$^{28}$ Orcutt, Mike. "Bitcoin and Ethereum have a hidden power structure, and it’s just been revealed". MIT Technology Review, www.technologyreview.com/s/610018/bitcoin-and-ethereum-have-a-hidden-power-structure-and-its-just-been-revealed/. Accessed 9 August 2018.

$^{29}$ Poon, Joseph, and Thaddeus Dryja. "The Bitcoin Lightning Network: Scalable Off-Chain Instant Payments". Coinshp.com, coinshp.com/assets/pdf/lightning.pdf. Accessed 12 August 2018.

$^{30}$ "Bitcoin Search Trends". GoogleTrends API. Google, DataSource: trends.google.com/trends/explore?date=2009-01-03%202018-08-12&q=bitcoin).

$^{31}$ maliky. Comment on "Trends with daily granularity #174". GitHub, 7 Jan 2018, 7:08 a.m. (EST), github.com/GeneralMills/pytrends/issues/174.

$^{32}$ whuber. Comment on "Regression: Transforming Variables". Stack Exchange, 23 Nov 2010, 17:55 p.m., stats.stackexchange.com/questions/4831/regression-transforming-variables/4833#4833.

$^{33}$ Mukaka, MM. "A guide to appropriate use of Correlation coefficient in medical research". US National Library of Medicine National Institutes of Health , www.ncbi.nlm.nih.gov/pmc/articles/PMC3576830/. Accessed 11 August 2018.

$^{34}$ Higgins, Bryon. "Velocity: Money's Second Dimension". Economic Review: Federal Reserve Bank of Kansas City, June 1978, www.kansascityfed.org/PUBLICAT/ECONREV/EconRevArchive/1978/2q78higg.pdf. Accessed 15 August 2018.